Evaluation of Tools for Opinion Mining Faschang Patrizia, Petz Gerald, Wimmer Markus, Dorfer Viktoria and Winkler Stephan School of Management; Upper Austria University of Applied Sciences, Steyr, Austria School of Informatics, Communications and Media; Upper Austria University of Applied Sciences, Hagenberg, Austria Abstract— Opinion Mining is a useful tool for gathering information about customers’ opinions about products, brands or companies. Much research in this area has been done in recent years and a couple of tools and methods can be identified. However, there are still challenges related to user generated content or texts in German language. This paper proposes an approach to combine existing methods and tools for analyzing the sentiment orientation of text and evaluates these methods. Index Terms— information retrieval, text mining, sentiment analysis I. INTRODUCTION B logs, wikis and social networks have become part of our digital life and have changed the way we communicate significantly. There are 255 million of websites on the internet [1] and therefore a lot of facts and opinions are available for companies and customers. Everybody is able to publish subjective information about products, brands and companies. The exchange of information and opinions of consumers on the Web 2.0 also means that a greater confidence in the products, brands and services are created, which e.g. in the ecommerce-sector leads to a higher purchase probability [2], [3]. Therefore it is very important that companies are able to find, extract and analyze this user generated content, because these contents contain significant “real” market relevant data. Furthermore it is an easy and cheap way to gain a current market overview to generate new strategic, tactical and operational plans and policies as well as creating relevant brand messages. Opinion Mining deals with scientific methods in order to find, extract, and systematically analyze product, company or brand related views on the internet. The identification of sentiment orientation (positive, neutral and negative) of consumers’ opinions is an essential part of the opinion mining process. The objectives of this paper are to (i) combine existing methods and tools for analyzing the sentiment orientation of German text and (ii) to evaluate these methods in relation to their applicability and accuracy. II. RELATED WORK Sentiment analysis has been studied by many researchers in recent years. Several main research directions can be identified: (1) Sentiment classification based on document level: The classification shall reveal whether the complete document has a positive, negative or neutral sentiment orientation. [4], [5], [6], [7] and many others have studied classification at document level. (2) Sentiment classification based on sentence level or on feature level try to determine the sentiment orientation of each sentence or feature; research work has been done in [8], [9], [10], [11] and many others. Most classification methods – both at document level and sentence level –are based on identification of opinion words or phrases. Two types of approaches are usually used: corpusbased approaches (e.g. [12], [6], [13]) and dictionary-based / lexicon-based approaches (e.g. [8], [9], [14], [8], [15], [11]). III. PROBLEM DEFINITION Several research papers identified the challenges of sentiment analysis, e.g. that opinion words are domain dependent and furthermore the same word may even indicate different opinions depending on the features. In our research work we face several challenges: (1) Complexity of the German Language The German language is more complex (many synonyms for both verbs, nouns and negators) than e.g. the English language; therefore several widely used tools like stemmers are not applicable. [16] (2) Mistakes in User Generated Content Further challenges arise from the fact that many texts that were written by consumers are grammatically incorrect, contain typing mistakes or are colloquial. [16] (3) Lack of standardized test sets In order to review and compare the quality of the results of sentiment analysis, it is necessary to use a corpus using standardized quantities. [17], [18] contradictory arguments (e.g. the "star rating" and the textual utterances differ from each other). Furthermore the spamproblem has to be taken into consideration in the processing and analysis of texts. [19] (4) Dealing with inconsistencies and spam User generated content can contain inconsistent and IV. TOOLS AND METHODS FOR SENTIMENT ANALYSIS The following process model describes steps, methods and tools to find, extract and analyze web data with regard to their sentiment orientation: Figure 1. process model The process model comprises the following steps: (1) Selection of relevant data sources Before content can be extracted from possible sources such as blogs, forums, social networks, microblogging services etc., the relevant sources have to be identified. The relevance is based on different criteria, as for example the thematic aptitude or the range of the source. (2) Selection of relevant methods and tools to analyze the data In the next step appropriate methods for running the different analyses has to be chosen. Examples are ontologies, thesauri as well as different methods to identify the semantic orientation of texts. (3) Pre-processing and pre-structuring of the contents on basis of the chosen methods Because of heterogeneity of the content and the sources as well as the varied quality of customer written texts it is important to prepare and prestructure the extracted content. The target of this process step is to identify irrelevant parts of websites, blogs, forums, etc. like advertisements and remove these valueless elements of the extracted content. (4) Transformation of the text in standard and further processed structure The next process phase has its target in transforming the extracted and prepared content with the help of natural language processing and information retrieval into a processable structure. (5) Analyzing the content in relation to its semantic orientation Afterwards the sentiment analysis of texts to determine if a sentence is positive, neutral or negative will be performed. In the literature there are different approaches for analyzing the sentimental orientation of texts: manual, dictionary- and corpus-based approaches. (6) Evaluation of the methods and tools Last but not least an evaluation of the used methods and tools is done. This should verify if the used methods and tools interact satisfactorily with each other. The following table outlines commonly used methods and tools for each process step: Step selection of relevant data Methods (examples) information retrieval on the web[20], relevanceindex [3] preprocessing thesaurus, ontologies, tokenizer, stemmer, screenscrapper transformation Analysis evaluation part of speech tagger, sentence splitter, orthographic coreferences classification methods based on document or sentence level (see Related Work) manual classification of sentiment orientation and feedback Tools (examples) webcrawler [21], [22], RSS-feeds, APIs for gathering data (e.g. [23]) RDF-OWL, Alchemy API[24], GATE [25], UIMA [26], GETESS [27, 28], openthesaurus.de [29], etc. TreeTagger [30], Sentence splitter, orthographic coreferences opinion observer [31], RapidMiner [32] TABLE 1 METHODS AND TOOLS FOR PROCESSING AND ANALYZING TEXTS V. EVALUATION A. Build a Corpus (1) Corpus The first step was to create a topic related corpus of words. This corpus represents the fundamental content, in our case amazon reviews, plus different kind of ratings stored in a database. The first version of the corpus contained 130 reviews including 1443 sentences. These reviews were subdivided into four domains: beamer, hdmi cable, vacuum cleaner and razor with a rating scale from minus three for very poor to plus three for very good. There are two ratings at a time per sentence and two ratings for the entire amazon review. Every single rating was done manually by two independent people. The creation of a corpus is useful because of the usage of selflearning algorithms in the later following sentiment analysis and due to the lack of corpuses in German language. (2) Create Feature List After developing the corpus it was necessary to create a feature list which contains words that represent strong positive or negative sentiments to identify the semantic orientation of the text. This list was gathered by using a tool called Treetagger [30] which tagged single words in the text and put them to their principal form. After tagging the words, a simple self-written tool was used to count each word of the content. Adjectives with the highest frequency were selected and gathered in a list. This represented the first version of the feature list. B. Transformation and Pre-Processing Before the first test run with an analysis tool it was necessary to convert the corpus and the feature list into proper shape. Therefore a matrix was set up that contained the id of the sentence in y direction and the features in x direction. id 1 2 4 7 beautiful 1 0 0 1 Good 1 1 0 0 long 0 2 1 TABLE 1 - SAMPLE OF THE MATRIX The matrix data represents the feature frequency within a sentence; the feature is only listed if its frequency on the whole text is greater than ten times. This matrix is used as input for HeuristicLab Software Environment [33], [34]. WEKA does not use the matrix view to process the data, it uses a list of features, ratings and data in a form with a special notation. [35] C. Testing 1) Initial Run – Results from WEKA The initial run was done with the WEKA tool, a data set of 130 reviews and a feature list containing 115 features was tested with different algorithms. WEKA Correct Incorrect Best-First Decision Tree FT tree J48 tree J48graft tree LAD-Tree Random Forest REP-tree CART-decision tree 49.22% 55.22% 48.88 % 49.11 % 55.33 % 55% 47.66 % 48.55 % 50.78% 44.78 51.12 % 50.89 % 44.67 % 45 % 52.34 % 51.45 % TABLE 2 RESULTS INITIAL RUN table 2 shows that the best results with the basic corpus and the first feature list are between 50% and 55% correctly classified instances. 2) Improvements - Results from HeuristicLab using a SVM In order to improve the results, the number of reviews was elevated to 260 with 4095 sentences overall. Furthermore a function to check the words surrounding the features in a defined radius was implemented to identify negations. The number of features in this test case was reduced to 82 and the corpus was tested with different settings of the WEKA Support Vector Machine (SVM) [36] and a Logistic Model Tree (LMT). WEKA (radius 3) Correct Incorrect SVM (best result) LMT tree 69.01 % 69.40 % 30.99 % 30.60 % TABLE 3 WEKA RADIUS 3 WEKA (radius 5) SVM (best result) LMT tree Correct 68.27 % 71.80 % Synonyms of pretty: beautiful, cheerful, cute… Result: This car is pretty. This tool recognizes that beautiful is a synonym of pretty and replaces beautiful with pretty. Before the analysis with HL or WEKA the content was aligned with the thesaurus tool. This resulted in having many more sentences and features in our matrix . The following SVM settings were used in HL: Gamma:0.9 Cost: 0.6 Svmtyp: nu_SVR Epsilon: 0.001 Nu: 0.7 After reducing the feature list there were 42 features left in the list. The negative effect of reducing the feature list is shown in table 6. However the HL software shows an improvement in the correctly classified sentences in table 7. WEKA (radius 3) Correct Incorrect J48 tree SVM(best result) 67.54 % 66,86 % 32.46 % 33,14 % TABLE 6 – WEKA TEST RESULTS IN TEST RUN 3 HL (radius 3) Incorrect 31.73 % 28.20 % SVM(best result) Correct Incorrect 64,46% 35.54 % TABLE 7 – HL TEST RESULTS IN TEST RUN 3 TABLE 4 WEKA RADIUS 5 table 3 and table 4 show the differences between the two tests and the significant improvement of the correctly classified instances. Radius 3 respectively radius 5 defines the word distance between a feature word and a negator. In addition to the Support Vector Machine in Heuristic Lab Software Environment (HL), a linear regression was used to analyze the text. These results are shown in table 5. HL (radius 3) Correct Incorrect SVM (best result) Lin. Regression 59,53% 41,58% 40,47 % 58,42% TABLE 5 HL RADIUS 3 3) Further Improvements – Usage of Thesaurus The third test stage started by reducing the features again in exchange by using a thesaurus to get a higher range of a single feature. Thesaurus example: Text: This car is beautiful. Features: big, pretty, sharp… VI. CONCLUSION AND FURTHER WORK In summarizing the above evaluation we can reason the following statements: The evaluated methods and tools proved to be good choices and exposed promising results. The enlargement of the trainings-dataset has a positive influence on the precision of the sentiment analysis. The optimization of the feature list has a positive effect on the sentiment analysis. The optimization of the feature list has been performed in two ways: (i) The feature-list was reduced by eliminating useless words and by specifying the words more precise with reference to the domain. (ii) The usage of a thesaurus reduces the feature list on the one hand, because synonyms can be aggregated to one expression. On the other hand more text can be analyzed and evaluated regarding the sentiment orientation. The reduction of the features has to be done carefully, because it turned out that an excessive reduction of the features has a negative influence on the results. In order to improve the aforementioned results, this further work will be carried out: [5] The corpus has to be further enlarged; in particular other domains will be added to cover more areas of interest. The thesaurus usage can be improved due to colloquial words and abbreviations in common use (e.g. smileys, IMHO, etc.) [6] The optimization of the feature-list will be an important task: in addition to the optimization tasks mentioned above the feature-list can be adjusted to the thesaurus, thus the sentiment analysis will be able to perform better. [7] The researchers will carry out some fine-tuning of the parameters used in SVM: o gamma: 0.1 – 1 (using 0.1 steps) o epsilon: 0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1 [8] o nu: 0.1 – 1 (using 0.1 steps) o cost: 0.1 – 1 (using 0.1 steps), 5, 10 [9] [10] [11] REFERENCES [1] [2] [3] [4] T. Alby, Web 2.0: Konzepte, Anwendungen, Technologien. München: Hanser, 2008. B. Berendt, A. Hotho, D. Mladenic, M. van Someren, M. Spiliopoulou, and G. Stumme, “A Roadmap for Web Mining: From Web to Semantic Web,” in Hot topics, vol. 3209, Web mining: from Web to semantic Web: First European Web Mining Forum, EWMF 2003, Cavtat-Dubrovnik, Croatia, September 22, 2003 ; invited and selected revised papers, B. Berendt, A. Hotho, D. Mladenic, M. van Someren, M. Spiliopoulou, and G. Stumme, Eds, Berlin: Springer, 2004, pp. 1–22. C. Kaiser, “Opinion Mining im Web 2.0 – Konzept und Fallbeispiel,” HMD - Praxis der Wirtschaftsinformatik, vol. 46, no. 268, pp. 90–99, 2009. K. Dave, S. Lawrence, and D. M. Pennock, “Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews,” in WWW 2003: The twelfth International World Wide Web ; Budapest Convention Centre, 20-24 May 2003, Budapest, Hungary ; proceedings, New York, NY: ACM, 2003, pp. 519–528. [12] [13] [14] [15] M. Gamon, A. Aue, S. Corston-Oliver, and E. Ringger, “Pulse: Mining Customer Opinions from Free Text,” in Lecture notes in computer science, vol. 3646, Advances in intelligent data analysis VI: 6th International Symposium on Intelligent Data Analysis, IDA 2005, Madrid, Spain, September 8 - 10, 2005 ; proceedings, A. F. Famili, J. N. Kok, J. M. Pena, A. Siebes, and A. Feelders, Eds, Berlin: Springer, 2005, pp. 121–132. P. D. Turney, “Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews,” in Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA, USA, 2002, pp. 417– 424. B. Pang and L. Lee, “Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales,” in Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, 2005, pp. 115–124. M. Hu and B. Liu, “Mining and summarizing customer reviews,” in Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, 2004, pp. 168–177. S.-M. Kim and E. Hovy, “Determining the Sentiment of Opinions,” in Proceedings of 20th International Conference on Computational Linguistics, Geneva, Switzerland, 2004, pp. 1367–1373. T. Wilson, J. Wiebe, and R. Hwa, “Just How Mad Are You? Finding Strong and Weak Opinion Clauses,” in Proceedings of the Nineteenth National Conference on Artificial Intelligence: July 25 - 29, 2004, San Jose, California, Menlo Park, Calif.: AAAI Press/MIT Press, 2004, pp. 761–767. A.-M. Popescu and O. Etzioni, “Extracting Product Features and Opinions from Reviews,” in Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, 2005, pp. 339–346. V. Hatzivassiloglou and J. Wiebe, “Effects of Adjective Orientation and Gradability on Sentence Subjectivity,” in Proceedings of the 18th conference on Computational linguistics, 2000, pp. 299–305. J. Wiebe and R. Mihalcea, “Word Sense and Subjectivity,” in Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, 2006, pp. 1065–1072. X. Ding, B. Liu, and P. S. Yu, “A Holistic LexiconBased Approach to Opinion Mining,” in International Conference on Web Search & Data Mining: Palo Alto, California, Feb. 11 - 12, 2008, New York, NY: ACM, 2008. H. Kanayama and T. Nasukawa, “Fully automatic lexicon expansion for domain-oriented sentiment analysis,” in Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, Stroudsburg, PA, USA, 2006, pp. 355–363. G. Schneider and H. Zimmermann, “Text-MiningMethoden im Semantic Web,” HMD - Praxis der Wirtschaftsinformatik, vol. 47, no. 271, pp. 35–46, 2010. [17] C. Feilmayr and B. Pröll, “Ontologiebasierte Informationsextraktion im eTourismus,” HMD - Praxis der Wirtschaftsinformatik, vol. 46, no. 270, pp. 63–72, 2009. [18] C. Ziegler, “Sentiment Detection: maschinelles Textverständnis: Die Vermessung der Meinung,” iX Magazin für professionelle In-formationstechnik, no. 10, pp. 106–112, 2006. [19] K. Sohns and M. H. Breitner, “Wettbewerbsvorteil durch Online Content Mining,” HMD - Praxis der Wirtschaftsinformatik, vol. 46, no. 270, pp. 54–62, 2009. [20] M. Kobayashi and K. Takeda, “Information Retrieval on the Web,” ACM Computing Surveys (CSUR), vol. 32, no. 2, pp. 144–173, 2000. [21] P. Dahiwale, A. Mokhade, and M. M. Raghuwanshi, “Intelligent Web Crawler,” in Proceedings of the International Conference and Workshop on Emerging Trends in Technology, 2010, pp. 613–617. [22] J. Eno, S. Gauch, and C. Thompson, “Intelligent Crawling in Virtual Worlds,” in Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology /// IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technologies, 2009: WI-IAT '09 ; 15 - 18 Sept. 2009, Milano, Italy ; proceedings, Piscataway, NJ: IEEE, 2009, pp. 555– 558. [23] Facebook Developers. Available: http://developers.facebook.com/docs/ (2011, Mar. 10). [24] Alchemy API. Available: http://www.alchemyapi.com/ (2001, Mar. 10). [25] H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan, “GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications,” in Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics, 2002. [26] Apache Software Foundation, UIMA. Available: http://uima.apache.org/ (2011, Mar. 10). [27] I. Bruder, A. Düsterhöft, M. Becker, J. Bedersdorfer, and G. Neumann, “GETESS: Constructing a Linguistic Search Index for an Internet Search Engine,” in Lecture notes in computer science, Natural Language Processing and Information Systems, Berlin, Heidelberg: Springer, 2001, pp. 227–238. [28] S. Staab, C. Braun, I. Bruder, A. Düsterhöft, A. Heuer, M. Klettke, G. Neumann, B. Prager, J. Pretzel, H.-P. Schnurr, R. Studer, H. Uszkoreit, and B. Wrenger, “GETESS: searching the web exploiting german texts,” in Proceedings of the 3rd international conference on Cooperative information agents /// Cooperative information agents III: Third International Workshop, [16] [29] [30] [31] [32] [33] [34] [35] [36] CIA'99, Uppsala, Sweden, July 31-August 2, 1999proceedings, Berlin New York: Springer, 1999. openthesaurus.de. Available: http://www.openthesaurus.de/ (2011, Mar. 10). H. Schmid, “TreeTagger - a language independent partof-speech tagger,” B. Liu, M. Hu, and J. Cheng, “Opinion observer: analyzing and comparing opinions on the Web,” in Proceedings of the 14th international conference on World Wide Web, New York, NY, USA, 2005, pp. 342–351. F. Jungermann, “Information Extraction with RapidMiner,” in Proceedings of the GSCL Symposium 'Sprachtechnologie und eHumanities', 2009. S. Wagner, “Heuristic Optimization Software Systems Modeling of Heuristic Optimization Algorithms in the HeuristicLab Software Environment,” PhD thesis, Johannes Kepler University Linz, Linz, 2009. “HeuristicLab Development Homepage,” Heuristic and Evolutionary Algorithms Laboratory, Upper Austria University of Applied Sciences. M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten, “The WEKA Data Mining Software: An Update,” ACM SIGKDD Explorations Newsletter, vol. 11, no. 1, pp. 10–18, 2009. V. N. Vapnik, Statistical learning theory. New York: Wiley, 1998. AUTHORS BIOGRAPHIES PATRIZIA FASCHANG received her bachelor degree in electronic business and is a junior researcher at the Research Center Steyr, School of Management, in the area of digital economy. She has worked on several small projects in the field of marketing and electronic business and is currently working on Opinion Mining and Web 2.0 methods within the TSCHECHOW project. GERALD PETZ is director of the degree course "Marketing and Electronic Business" at the University of Applied Sciences in Upper Austria, School of Management in Steyr. His main research areas are Web 2.0, electronic business and electronic marketing; he has also conducted several R&D projects in these research fields. Before starting his academic career he was project manager and CEO of an internet company. MARKUS WIMMER received his bachelor degree in communication and knowledge media. He is a junior researcher in the area of digital economy at the Research Center Steyr, School of Management. Before his academic career at the University of Applied Science in upper Austria, he was a flash developer of a marketing and internet company. Currently he is working on Opinion Mining and Web 2.0 methods within the OPMIN 2.0 project. VIKTORIA DORFER is a senior researcher in the field of Bioinformatics at the Research Center Hagenberg, School of Informatics, Communications and Media. After finishing the diploma degree of bioinformatics in 2007 she was a team member of various projects in the field of informatics and bioinformatics. She is currently working on information retrieval within the TSCHECHOW project. STEPHAN M. WINKLER received his MSc in computer science in 2004 and his PhD in engineering sciences in 2008, both from Johannes Kepler University (JKU) Linz, Austria. His research interests include genetic programming, nonlinear model identification and machine learning. Since 2009, Dr. Winkler is professor at the Department for Medical and Bioinformatics at the Upper Austria University of Applied Sciences, Campus Hagenberg.
© Copyright 2026 Paperzz