Evaluation of Tools for Opinion Mining

Evaluation of Tools for Opinion Mining
Faschang Patrizia, Petz Gerald, Wimmer Markus, Dorfer Viktoria and Winkler Stephan
School of Management; Upper Austria University of Applied Sciences, Steyr, Austria
School of Informatics, Communications and Media; Upper Austria University of Applied Sciences,
Hagenberg, Austria
Abstract— Opinion Mining is a useful tool for gathering
information about customers’ opinions about products, brands or
companies. Much research in this area has been done in recent
years and a couple of tools and methods can be identified.
However, there are still challenges related to user generated
content or texts in German language. This paper proposes an
approach to combine existing methods and tools for analyzing the
sentiment orientation of text and evaluates these methods.
Index Terms— information retrieval, text mining, sentiment
analysis
I. INTRODUCTION
B
logs, wikis and social networks have become part of our
digital life and have changed the way we communicate
significantly. There are 255 million of websites on the
internet [1] and therefore a lot of facts and opinions are
available for companies and customers. Everybody is able to
publish subjective information about products, brands and
companies.
The exchange of information and opinions of consumers on
the Web 2.0 also means that a greater confidence in the
products, brands and services are created, which e.g. in the ecommerce-sector leads to a higher purchase probability [2],
[3]. Therefore it is very important that companies are able to
find, extract and analyze this user generated content, because
these contents contain significant “real” market relevant data.
Furthermore it is an easy and cheap way to gain a current
market overview to generate new strategic, tactical and
operational plans and policies as well as creating relevant
brand messages.
Opinion Mining deals with scientific methods in order to find,
extract, and systematically analyze product, company or brand
related views on the internet. The identification of sentiment
orientation (positive, neutral and negative) of consumers’
opinions is an essential part of the opinion mining process.
The objectives of this paper are to (i) combine existing
methods and tools for analyzing the sentiment orientation of
German text and (ii) to evaluate these methods in relation to
their applicability and accuracy.
II. RELATED WORK
Sentiment analysis has been studied by many researchers in
recent years. Several main research directions can be
identified:
(1) Sentiment classification based on document level: The
classification shall reveal whether the complete document has
a positive, negative or neutral sentiment orientation. [4], [5],
[6], [7] and many others have studied classification at
document level.
(2) Sentiment classification based on sentence level or on
feature level try to determine the sentiment orientation of each
sentence or feature; research work has been done in [8], [9],
[10], [11] and many others.
Most classification methods – both at document level and
sentence level –are based on identification of opinion words or
phrases. Two types of approaches are usually used: corpusbased approaches (e.g. [12], [6], [13]) and dictionary-based /
lexicon-based approaches (e.g. [8], [9], [14], [8], [15], [11]).
III. PROBLEM DEFINITION
Several research papers identified the challenges of sentiment
analysis, e.g. that opinion words are domain dependent and
furthermore the same word may even indicate different
opinions depending on the features. In our research work we
face several challenges:
(1) Complexity of the German Language
The German language is more complex (many synonyms for
both verbs, nouns and negators) than e.g. the English language;
therefore several widely used tools like stemmers are not
applicable. [16]
(2) Mistakes in User Generated Content
Further challenges arise from the fact that many texts that were
written by consumers are grammatically incorrect, contain
typing mistakes or are colloquial. [16]
(3) Lack of standardized test sets
In order to review and compare the quality of the results of
sentiment analysis, it is necessary to use a corpus using
standardized quantities. [17], [18]
contradictory arguments (e.g. the "star rating" and the textual
utterances differ from each other). Furthermore the spamproblem has to be taken into consideration in the processing
and analysis of texts. [19]
(4) Dealing with inconsistencies and spam
User generated content can contain inconsistent and
IV. TOOLS AND METHODS FOR SENTIMENT ANALYSIS
The following process model describes steps, methods and tools to find, extract and analyze web data with regard to their
sentiment orientation:
Figure 1. process model
The process model comprises the following steps:
(1) Selection of relevant data sources
Before content can be extracted from possible sources
such as blogs, forums, social networks, microblogging services etc., the relevant sources have to be
identified. The relevance is based on different criteria,
as for example the thematic aptitude or the range of
the source.
(2) Selection of relevant methods and tools to analyze the
data
In the next step appropriate methods for running the
different analyses has to be chosen. Examples are
ontologies, thesauri as well as different methods to
identify the semantic orientation of texts.
(3) Pre-processing and pre-structuring of the contents on
basis of the chosen methods
Because of heterogeneity of the content and the
sources as well as the varied quality of customer
written texts it is important to prepare and prestructure the extracted content. The target of this
process step is to identify irrelevant parts of websites,
blogs, forums, etc. like advertisements and remove
these valueless elements of the extracted content.
(4) Transformation of the text in standard and further
processed structure
The next process phase has its target in transforming
the extracted and prepared content with the help of
natural language processing and information retrieval
into a processable structure.
(5) Analyzing the content in relation to its semantic
orientation
Afterwards the sentiment analysis of texts to determine
if a sentence is positive, neutral or negative will be
performed. In the literature there are different
approaches for analyzing the sentimental orientation of
texts: manual, dictionary- and corpus-based
approaches.
(6) Evaluation of the methods and tools
Last but not least an evaluation of the used methods
and tools is done. This should verify if the used
methods and tools interact satisfactorily with each
other.
The following table outlines commonly used methods and
tools for each process step:
Step
selection of
relevant data
Methods (examples)
information retrieval on
the web[20], relevanceindex [3]
preprocessing
thesaurus, ontologies,
tokenizer, stemmer,
screenscrapper
transformation
Analysis
evaluation
part of speech tagger,
sentence splitter,
orthographic coreferences
classification methods
based on document or
sentence level (see
Related Work)
manual classification of
sentiment orientation
and feedback
Tools (examples)
webcrawler [21],
[22], RSS-feeds,
APIs for
gathering data
(e.g. [23])
RDF-OWL,
Alchemy API[24],
GATE [25],
UIMA [26],
GETESS [27, 28],
openthesaurus.de
[29], etc.
TreeTagger [30],
Sentence splitter,
orthographic coreferences
opinion observer
[31], RapidMiner
[32]
TABLE 1 METHODS AND TOOLS FOR PROCESSING AND ANALYZING TEXTS
V. EVALUATION
A. Build a Corpus
(1) Corpus
The first step was to create a topic related corpus of words.
This corpus represents the fundamental content, in our case
amazon reviews, plus different kind of ratings stored in a
database. The first version of the corpus contained 130
reviews including 1443 sentences. These reviews were
subdivided into four domains: beamer, hdmi cable, vacuum
cleaner and razor with a rating scale from minus three for very
poor to plus three for very good. There are two ratings at a
time per sentence and two ratings for the entire amazon
review. Every single rating was done manually by two
independent people.
The creation of a corpus is useful because of the usage of selflearning algorithms in the later following sentiment analysis
and due to the lack of corpuses in German language.
(2) Create Feature List
After developing the corpus it was necessary to create a
feature list which contains words that represent strong positive
or negative sentiments to identify the semantic orientation of
the text. This list was gathered by using a tool called
Treetagger [30] which tagged single words in the text and put
them to their principal form. After tagging the words, a simple
self-written tool was used to count each word of the content.
Adjectives with the highest frequency were selected and
gathered in a list. This represented the first version of the
feature list.
B. Transformation and Pre-Processing
Before the first test run with an analysis tool it was necessary
to convert the corpus and the feature list into proper shape.
Therefore a matrix was set up that contained the id of the
sentence in y direction and the features in x direction.
id
1
2
4
7
beautiful
1
0
0
1
Good
1
1
0
0
long
0
2
1
TABLE 1 - SAMPLE OF THE MATRIX
The matrix data represents the feature frequency within a
sentence; the feature is only listed if its frequency on the whole
text is greater than ten times. This matrix is used as input for
HeuristicLab Software Environment [33], [34]. WEKA does
not use the matrix view to process the data, it uses a list of
features, ratings and data in a form with a special notation.
[35]
C. Testing
1) Initial Run – Results from WEKA
The initial run was done with the WEKA tool, a data set of
130 reviews and a feature list containing 115 features was
tested with different algorithms.
WEKA
Correct
Incorrect
Best-First Decision Tree
FT tree
J48 tree
J48graft tree
LAD-Tree
Random Forest
REP-tree
CART-decision tree
49.22%
55.22%
48.88 %
49.11 %
55.33 %
55%
47.66 %
48.55 %
50.78%
44.78
51.12 %
50.89 %
44.67 %
45 %
52.34 %
51.45 %
TABLE 2 RESULTS INITIAL RUN
table 2 shows that the best results with the basic corpus and the
first feature list are between 50% and 55% correctly classified
instances.
2) Improvements - Results from HeuristicLab using a SVM
In order to improve the results, the number of reviews was
elevated to 260 with 4095 sentences overall. Furthermore a
function to check the words surrounding the features in a
defined radius was implemented to identify negations. The
number of features in this test case was reduced to 82 and the
corpus was tested with different settings of the WEKA Support
Vector Machine (SVM) [36] and a Logistic Model Tree
(LMT).
WEKA (radius 3)
Correct
Incorrect
SVM (best result)
LMT tree
69.01 %
69.40 %
30.99 %
30.60 %
TABLE 3 WEKA RADIUS 3
WEKA (radius 5)
SVM (best result)
LMT tree
Correct
68.27 %
71.80 %
Synonyms of pretty: beautiful, cheerful, cute…
Result: This car is pretty.
This tool recognizes that beautiful is a synonym of pretty and
replaces beautiful with pretty. Before the analysis with HL or
WEKA the content was aligned with the thesaurus tool. This
resulted in having many more sentences and features in our
matrix
.
The following SVM settings were used in HL:
Gamma:0.9
Cost: 0.6
Svmtyp: nu_SVR
Epsilon: 0.001
Nu: 0.7
After reducing the feature list there were 42 features left in the
list. The negative effect of reducing the feature list is shown in
table 6. However the HL software shows an improvement in
the correctly classified sentences in table 7.
WEKA (radius 3)
Correct
Incorrect
J48 tree
SVM(best result)
67.54 %
66,86 %
32.46 %
33,14 %
TABLE 6 – WEKA TEST RESULTS IN TEST RUN 3
HL (radius 3)
Incorrect
31.73 %
28.20 %
SVM(best result)
Correct
Incorrect
64,46%
35.54 %
TABLE 7 – HL TEST RESULTS IN TEST RUN 3
TABLE 4 WEKA RADIUS 5
table 3 and table 4 show the differences between the two tests
and the significant improvement of the correctly classified
instances. Radius 3 respectively radius 5 defines the word
distance between a feature word and a negator.
In addition to the Support Vector Machine in Heuristic Lab
Software Environment (HL), a linear regression was used to
analyze the text. These results are shown in table 5.
HL (radius 3)
Correct
Incorrect
SVM (best result)
Lin. Regression
59,53%
41,58%
40,47 %
58,42%
TABLE 5 HL RADIUS 3
3) Further Improvements – Usage of Thesaurus
The third test stage started by reducing the features again in
exchange by using a thesaurus to get a higher range of a single
feature.
Thesaurus example:
Text: This car is beautiful.
Features: big, pretty, sharp…
VI. CONCLUSION AND FURTHER WORK
In summarizing the above evaluation we can reason the
following statements:
The evaluated methods and tools proved to be good
choices and exposed promising results.
The enlargement of the trainings-dataset has a positive
influence on the precision of the sentiment analysis.
The optimization of the feature list has a positive effect
on the sentiment analysis. The optimization of the
feature list has been performed in two ways: (i) The
feature-list was reduced by eliminating useless words
and by specifying the words more precise with
reference to the domain. (ii) The usage of a thesaurus
reduces the feature list on the one hand, because
synonyms can be aggregated to one expression. On
the other hand more text can be analyzed and
evaluated regarding the sentiment orientation. The
reduction of the features has to be done carefully,
because it turned out that an excessive reduction of
the features has a negative influence on the results.
In order to improve the aforementioned results, this further
work will be carried out:
[5]
The corpus has to be further enlarged; in particular
other domains will be added to cover more areas of
interest.
The thesaurus usage can be improved due to colloquial
words and abbreviations in common use (e.g. smileys,
IMHO, etc.)
[6]
The optimization of the feature-list will be an
important task: in addition to the optimization tasks
mentioned above the feature-list can be adjusted to
the thesaurus, thus the sentiment analysis will be able
to perform better.
[7]
The researchers will carry out some fine-tuning of the
parameters used in SVM:
o gamma: 0.1 – 1 (using 0.1 steps)
o epsilon: 0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1
[8]
o nu: 0.1 – 1 (using 0.1 steps)
o cost: 0.1 – 1 (using 0.1 steps), 5, 10
[9]
[10]
[11]
REFERENCES
[1]
[2]
[3]
[4]
T. Alby, Web 2.0: Konzepte, Anwendungen,
Technologien. München: Hanser, 2008.
B. Berendt, A. Hotho, D. Mladenic, M. van Someren,
M. Spiliopoulou, and G. Stumme, “A Roadmap for
Web Mining: From Web to Semantic Web,” in Hot
topics, vol. 3209, Web mining: from Web to semantic
Web: First European Web Mining Forum, EWMF
2003, Cavtat-Dubrovnik, Croatia, September 22, 2003
; invited and selected revised papers, B. Berendt, A.
Hotho, D. Mladenic, M. van Someren, M.
Spiliopoulou, and G. Stumme, Eds, Berlin: Springer,
2004, pp. 1–22.
C. Kaiser, “Opinion Mining im Web 2.0 – Konzept und
Fallbeispiel,” HMD - Praxis der Wirtschaftsinformatik,
vol. 46, no. 268, pp. 90–99, 2009.
K. Dave, S. Lawrence, and D. M. Pennock, “Mining the
Peanut Gallery: Opinion Extraction and Semantic
Classification of Product Reviews,” in WWW 2003: The
twelfth International World Wide Web ; Budapest
Convention Centre, 20-24 May 2003, Budapest,
Hungary ; proceedings, New York, NY: ACM, 2003,
pp. 519–528.
[12]
[13]
[14]
[15]
M. Gamon, A. Aue, S. Corston-Oliver, and E. Ringger,
“Pulse: Mining Customer Opinions from Free Text,” in
Lecture notes in computer science, vol. 3646, Advances
in intelligent data analysis VI: 6th International
Symposium on Intelligent Data Analysis, IDA 2005,
Madrid, Spain, September 8 - 10, 2005 ; proceedings,
A. F. Famili, J. N. Kok, J. M. Pena, A. Siebes, and A.
Feelders, Eds, Berlin: Springer, 2005, pp. 121–132.
P. D. Turney, “Thumbs Up or Thumbs Down?
Semantic Orientation Applied to Unsupervised
Classification of Reviews,” in Proceedings of the 40th
Annual Meeting of the Association for Computational
Linguistics, Philadelphia, PA, USA, 2002, pp. 417–
424.
B. Pang and L. Lee, “Seeing stars: Exploiting class
relationships for sentiment categorization with respect
to rating scales,” in Proceedings of the 43rd Annual
Meeting on Association for Computational Linguistics,
2005, pp. 115–124.
M. Hu and B. Liu, “Mining and summarizing customer
reviews,” in Proceedings of the tenth ACM SIGKDD
international conference on Knowledge discovery and
data mining, 2004, pp. 168–177.
S.-M. Kim and E. Hovy, “Determining the Sentiment of
Opinions,” in Proceedings of 20th International
Conference on Computational Linguistics, Geneva,
Switzerland, 2004, pp. 1367–1373.
T. Wilson, J. Wiebe, and R. Hwa, “Just How Mad Are
You? Finding Strong and Weak Opinion Clauses,” in
Proceedings of the Nineteenth National Conference on
Artificial Intelligence: July 25 - 29, 2004, San Jose,
California, Menlo Park, Calif.: AAAI Press/MIT Press,
2004, pp. 761–767.
A.-M. Popescu and O. Etzioni, “Extracting Product
Features and Opinions from Reviews,” in Proceedings
of Human Language Technology Conference and
Conference on Empirical Methods in Natural
Language Processing, 2005, pp. 339–346.
V. Hatzivassiloglou and J. Wiebe, “Effects of Adjective
Orientation and Gradability on Sentence Subjectivity,”
in Proceedings of the 18th conference on
Computational linguistics, 2000, pp. 299–305.
J. Wiebe and R. Mihalcea, “Word Sense and
Subjectivity,” in Proceedings of the 21st International
Conference on Computational Linguistics and the 44th
annual meeting of the Association for Computational
Linguistics, 2006, pp. 1065–1072.
X. Ding, B. Liu, and P. S. Yu, “A Holistic LexiconBased Approach to Opinion Mining,” in International
Conference on Web Search & Data Mining: Palo Alto,
California, Feb. 11 - 12, 2008, New York, NY: ACM,
2008.
H. Kanayama and T. Nasukawa, “Fully automatic
lexicon expansion for domain-oriented sentiment
analysis,” in Proceedings of the 2006 Conference on
Empirical Methods in Natural Language Processing,
Stroudsburg, PA, USA, 2006, pp. 355–363.
G. Schneider and H. Zimmermann, “Text-MiningMethoden im Semantic Web,” HMD - Praxis der
Wirtschaftsinformatik, vol. 47, no. 271, pp. 35–46,
2010.
[17] C. Feilmayr and B. Pröll, “Ontologiebasierte
Informationsextraktion im eTourismus,” HMD - Praxis
der Wirtschaftsinformatik, vol. 46, no. 270, pp. 63–72,
2009.
[18] C. Ziegler, “Sentiment Detection: maschinelles
Textverständnis: Die Vermessung der Meinung,” iX Magazin für professionelle In-formationstechnik, no.
10, pp. 106–112, 2006.
[19] K. Sohns and M. H. Breitner, “Wettbewerbsvorteil
durch Online Content Mining,” HMD - Praxis der
Wirtschaftsinformatik, vol. 46, no. 270, pp. 54–62,
2009.
[20] M. Kobayashi and K. Takeda, “Information Retrieval
on the Web,” ACM Computing Surveys (CSUR), vol.
32, no. 2, pp. 144–173, 2000.
[21] P. Dahiwale, A. Mokhade, and M. M. Raghuwanshi,
“Intelligent Web Crawler,” in Proceedings of the
International Conference and Workshop on Emerging
Trends in Technology, 2010, pp. 613–617.
[22] J. Eno, S. Gauch, and C. Thompson, “Intelligent
Crawling in Virtual Worlds,” in Proceedings of the
2009 IEEE/WIC/ACM International Joint Conference
on Web Intelligence and Intelligent Agent Technology
/// IEEE/WIC/ACM International Joint Conferences on
Web Intelligence and Intelligent Agent Technologies,
2009: WI-IAT '09 ; 15 - 18 Sept. 2009, Milano, Italy ;
proceedings, Piscataway, NJ: IEEE, 2009, pp. 555–
558.
[23] Facebook Developers. Available:
http://developers.facebook.com/docs/ (2011, Mar. 10).
[24] Alchemy API. Available: http://www.alchemyapi.com/
(2001, Mar. 10).
[25] H. Cunningham, D. Maynard, K. Bontcheva, and V.
Tablan, “GATE: A Framework and Graphical
Development Environment for Robust NLP Tools and
Applications,” in Proceedings of the 40th Anniversary
Meeting of the Association for Computational
Linguistics, 2002.
[26] Apache Software Foundation, UIMA. Available:
http://uima.apache.org/ (2011, Mar. 10).
[27] I. Bruder, A. Düsterhöft, M. Becker, J. Bedersdorfer,
and G. Neumann, “GETESS: Constructing a Linguistic
Search Index for an Internet Search Engine,” in Lecture
notes in computer science, Natural Language
Processing and Information Systems, Berlin,
Heidelberg: Springer, 2001, pp. 227–238.
[28] S. Staab, C. Braun, I. Bruder, A. Düsterhöft, A. Heuer,
M. Klettke, G. Neumann, B. Prager, J. Pretzel, H.-P.
Schnurr, R. Studer, H. Uszkoreit, and B. Wrenger,
“GETESS: searching the web exploiting german texts,”
in Proceedings of the 3rd international conference on
Cooperative information agents /// Cooperative
information agents III: Third International Workshop,
[16]
[29]
[30]
[31]
[32]
[33]
[34]
[35]
[36]
CIA'99, Uppsala, Sweden, July 31-August 2,
1999proceedings, Berlin New York: Springer, 1999.
openthesaurus.de. Available:
http://www.openthesaurus.de/ (2011, Mar. 10).
H. Schmid, “TreeTagger - a language independent partof-speech tagger,”
B. Liu, M. Hu, and J. Cheng, “Opinion observer:
analyzing and comparing opinions on the Web,” in
Proceedings of the 14th international conference on
World Wide Web, New York, NY, USA, 2005, pp.
342–351.
F. Jungermann, “Information Extraction with
RapidMiner,” in Proceedings of the GSCL Symposium
'Sprachtechnologie und eHumanities', 2009.
S. Wagner, “Heuristic Optimization Software Systems Modeling of Heuristic Optimization Algorithms in the
HeuristicLab Software Environment,” PhD thesis,
Johannes Kepler University Linz, Linz, 2009.
“HeuristicLab Development Homepage,” Heuristic and
Evolutionary Algorithms Laboratory, Upper Austria
University of Applied Sciences.
M. Hall, E. Frank, G. Holmes, B. Pfahringer, P.
Reutemann, and I. H. Witten, “The WEKA Data
Mining Software: An Update,” ACM SIGKDD
Explorations Newsletter, vol. 11, no. 1, pp. 10–18,
2009.
V. N. Vapnik, Statistical learning theory. New York:
Wiley, 1998.
AUTHORS BIOGRAPHIES
PATRIZIA FASCHANG received her bachelor degree in
electronic business and is a junior researcher at the
Research Center Steyr, School of Management, in
the area of digital economy. She has worked on
several small projects in the field of marketing and
electronic business and is currently working on
Opinion Mining and Web 2.0 methods within the
TSCHECHOW project.
GERALD PETZ is director of the degree course "Marketing
and Electronic Business" at the University of
Applied Sciences in Upper Austria, School of
Management in Steyr. His main research areas
are Web 2.0, electronic business and electronic
marketing; he has also conducted several R&D
projects in these research fields. Before starting
his academic career he was project manager and CEO of an
internet company.
MARKUS WIMMER received his bachelor degree in
communication and knowledge media. He is a
junior researcher in the area of digital economy
at the Research Center Steyr, School of
Management. Before his academic career at the
University of Applied Science in upper Austria,
he was a flash developer of a marketing and internet company.
Currently he is working on Opinion Mining and Web 2.0
methods within the OPMIN 2.0 project.
VIKTORIA DORFER is a senior researcher in the field of
Bioinformatics at the Research Center
Hagenberg,
School
of
Informatics,
Communications and Media. After finishing the
diploma degree of bioinformatics in 2007 she
was a team member of various projects in the
field of informatics and bioinformatics. She is
currently working on information retrieval within the
TSCHECHOW project.
STEPHAN M. WINKLER received his MSc in computer
science in 2004 and his PhD in engineering
sciences in 2008, both from Johannes Kepler
University (JKU) Linz, Austria. His research
interests include genetic programming, nonlinear
model identification and machine learning. Since
2009, Dr. Winkler is professor at the Department
for Medical and Bioinformatics at the Upper Austria
University of Applied Sciences, Campus Hagenberg.