DEGREE PROJECT IN TECHNOLOGY, FIRST CYCLE, 15 CREDITS STOCKHOLM, SWEDEN 2016 Method performance difference of sentiment analysis on social media databases Sentiment classification in social media HENRIK JOHANSSON ANTON LILJA KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF COMPUTER SCIENCE AND COMMUNICATION Method performance difference of sentiment analysis on social media databases Skillnad i prestanda för sentimentanalysmetoder på data från sociala medier Sentiment classification in social media Sentimentklassifikation i sociala medier HENRIK JOHANSSON ANTON LILJA Degree Project in Computer Science, DD143X Supervisor: Richard Glassey Examiner: Örjan Ekeberg CSC, KTH 2016-05-11 Abstract As the amount of available data have exploded with the increase in use of social media the interest of doing sentiment anlysis have increased. However as the source and nature of the data have changed it is possible that the known methods will not perform as before. The purpose of this paper is to examine if such a difference exist and if the methods can be improved through preprocessing the data. The results show that there is a difference and that on this new type of data a lexicon approach may be a better choice than a machine learning based one. Preprocessing the data give some but no large improvements. Sammanfattning Den explosion av tillgänglig data i och med den ökade användningen av sociala medier har ökat intresset för att göra sentimentsanalys. Men eftersom källan och innehållet för den data som analyseras har förändrats är det möjligt att de metoder som används kommer att prestera annorlunda. Syftet med denna studie är att undersöka om en sådan skillnad finns och om metodernas träffsäkerhet kan ökas genom att förarbeta data. Resultatet visar att det finns en skillnad och att en lexikal analys kan vara ett bättre tillvägagångssätt än en metod baserad på maskininlärning. Att förarbeta data visar viss men inte i sammanhanget stor förbättring av resultatet. Contents 1 Introduction 1.1 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.1 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 2 3 2 Background 2.1 Sentiment Analysis . 2.2 Classification . . . . 2.3 Lexicon . . . . . . . 2.4 Machine learning . . 2.4.1 Preprocessing 2.5 Data . . . . . . . . . 2.5.1 Social media 2.5.2 Datasets . . . 2.6 Tools . . . . . . . . . 2.6.1 NLTK . . . . 2.6.2 Scikitlearning 2.6.3 AFINN . . . . . . . . . . . . . . . 4 4 4 5 5 7 8 8 8 9 9 9 9 . . . . . . . 10 10 10 10 11 11 12 12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Method 3.1 Data formatting . . . . . 3.1.1 Data requirements 3.1.2 Formatting . . . . 3.1.3 PreProcessing . . . 3.2 Testing procedure . . . . . 3.2.1 Lexicon . . . . . . 3.2.2 Machine learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Results 14 5 Discussion 5.1 Machine learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Lexicon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 16 16 17 5.4 Applications and future research . . . . . . . . . . . . . . . . . . . . 18 6 Conclusion 19 Bibliography 20 Chapter 1 Introduction In recent years the amount of available data has exploded as a result of the ever increasing use of social media. Today millions upon millions of thoughts, ideas and opinions are shared online. This opens the possibility to get a glimpse of what people talk and think about. The data itself is worthless on its own but can be a goldmine if properly analysed. To extract the general feeling, the sentiment, of a text can be interesting both for research and businesses purposes. For instance, a company may easily track the popularity of its brand, providing feedback to advertisement campaigns.The potential of the field have made it gain significant interest under a long time. This stretches back further than the use of social media. Early research focused on other sources of data, most commonly customer reviews online. The user reviews have the benefit of often having both text and a grade, providing the correct answers to a researcher. However there is a great deal of difference between most online interaction and the text found in a user review. This is reason to question old truths about which algorithms and settings is most effective in classifying data by the feelings it conveys. 1.1 Problem statement There is no known method to correctly extract the feelings in a text. Even humans sometimes misunderstands each other and fail to accurately understand the subtle cues which convey the feelings in text. As it is an unprecise field the accuracy of a method is of high importance. From previous thesis work, into method and algorithm accuracy (Hindersson & Lousseief 2015), the question of what methods achieve higher accuracy depending on social media dataset was suggested. This study aims to answer the question 1 CHAPTER 1. INTRODUCTION posed, and to be able to answer it better the following problem statement is used: • Does the accuracy of the machine learning and lexicon based sentiment classification methods differ depending on what source of data is used? – If there is a difference in accuracy between the datasets, why is it so? – Is it possible to improve the accuracy of the different methods? 1.1.1 Scope Not every method and social media database can be tested and some limitations is therefore needed to define the scope. There are many different methods, databases, settings and tuning that can be used to answer the questions in the problem statement. The combination of trying to use all the methods existing will be too time consuming both concerning implementation and runtime. Therefore this study will only try a limited amount of databases, methods and settings. The databases will be limited to already annotated and availiable free databases of social media and reviews in english. We do not have time or resources to collect data and annotate it ourselves. We want to compare social media to already studied data sources and that studying data in other languages may impact the results as the available tools may not be developed to handle other languages. The methods being tested will be some of the most used methods as the less used (and most likely less accurate) methods have limited value from an industry point of view. Focusing on the most used methods also ensure tools exist which simplifies the implementation, which is necessary as we have limited time and experince. As the focus of the settings is if the accuracy of methods can change, rather than exactly how a maximal result is reached, a small number of different settings will be explored. The small number will both keep implementation time down as well as limit the amount of results that would need to be analyzed. The settings selected will also be the ones that are simple to implement with the chosen tools. To summarize, in the study we will test a limited number of methods and datasets, due to constraints in time and experience. We will strive to have a result which reflect what may be used in real applications of sentiment analysis. The study will focus on known methods needing little work to implement. 2 CHAPTER 1. INTRODUCTION 1.2 Purpose Previous research on using sentiment analysis has often been focused on datasets of user reviews (Medhat et al. 2014). However as social media is exploding in size and importance in people lives, it is far more relevant to research this data. The purpose of this study is to examine how some common sentiment analysis methods perform when the source of data is social media. By using state of the art implementations the results will give a good picture of what algorithms has the greatest potential in this field of sentiment analysis on social media. As well as answering which algorithm that would perform well on social media, this paper aims to examine what and how big impact tuning the methods have on the accuracy. In combination this would give a starting point for further research of what should be studied and developed to achieve greater accuracy in sentiment analysis on social media. 3 Chapter 2 Background This chapter describes the sentiment analysis subject, how classification is done, how the lexical method works, how the machine learning methods works, what data is used and what the tools are. This serves as a support for understanding the results and the discussion. 2.1 Sentiment Analysis Sentiment analysis is the subject of using natural language processing to extract sentiment out of text. This can be done in several ways but the most commonly used are lexicon and machine learning based methods (Medhat et al. 2014). The methods purposes are the same but achieves the results in different ways. Lexicon based by using sentiment lexicons while machine learning based trains classifiers on data. 2.2 Classification Splitting data into groups or classes is known as the problem of classification. This is a common problem, natural language processing is no exception. The methods presented here may also be used in classification of other types of data. There are also a large number of methods not presented here since they are outperformed in this rather specific field of classifying text. Classifying data based on the feeling is known as sentiment classification. It can be used to classify any groups of feelings but most common is either a binary scale of positive and negative or one including a neutral option (Medhat et al. 2014, p. 1). The two groups of methods that will be examined in this paper is the lexicon based and machine learning based methods. These are the classification methods most commonly used in sentiment clssification. 4 CHAPTER 2. BACKGROUND 2.3 Lexicon A method for extracting sentiment from text is using a lexical approach with special lexicons. The lexicons used in this approach have words that have been classified with sentiments. Words such as “bad” or “horrible” might have a negative classification while words as “good” or “amazing” might have a positive classification. By then using a function for a text every word can then be assigned a sentiment (Pang & Lee 2008, p. 27). See figure 2.1 for a general picture on how this might be done. Figure 2.1. An illustration of how the lexicon based approach classifies text 2.4 Machine learning Machine learning can be explained as to have the computer learn to recognize patterns by mining data rather than being programmed rules (Joachims 1998). This method can be used to solve many different problems, classification being one of them. The methods used to solve the classification problem belong to the group of methods known as supervised learning algorithms. They require annotated data to learn from, which might be hard or time consuming to acquire. This is a major drawback of using a machine learning approach. See figure 2.2 for a schematic of the learning process. One of the most simple of machine learning methods is the Naive Bayes, which relies on the assumption that all features (typically words in classifications of texts) are independent. This assumption is of course false for most natural language texts, where something like the order of the words play a huge role. The method have however proved to be quite powerful despite its simplicity (Pang et al. 2002). 5 CHAPTER 2. BACKGROUND Figure 2.2. An illustration of how the machine learning based approach classifies text. Another machine learning method is the using a Support Vector Machine (SVM). The method is based on creating a plane where the data is placed and then splitting it into a given number of groups or classes as seen in figure 2.3. The vector is to have as large margin as possible to the classes and thereby maximizing the chance that new data will be sorted to right side of the vector and thereby sorted into the right class.(Tong & Koller 1998) The method can be extended to handle classification where the data is not linear separable, giving it an edge to other machine learning algorithms. This is however unlikely in the realm of classifying natural language as the data i generally sparse, which would suite a SVM approach.(Hoschs 2016) 6 CHAPTER 2. BACKGROUND Figure 2.3. H1 fails to classify the trainingdata into the two classes, H2 and H3 does but as H2 have a minimal margin. H3 is therefore used by the SVM as it has a maximal margin to both classes. Figure: (ZackWeinberg 2012). The last method to be tested is called decision tree. The method creates a tree which then is used to classify data. Each node in the tree contains a boolean expression regarding some feature of the data. To classify a piece of data it is taken from the root to a leaf, traveling through the nodes tree based on the features. Each leaf node contains a class and the data is classified upon reaching a leaf. The tree is created using a recursive method of replacing the leaf with the lowest probability of successfully classify data with a tree(Medhat et al. 2014). 2.4.1 Preprocessing The data, in this case text, must be parsed before being fed to the algorithms. There are a number of ways to represent the words as features, the two most common is bag of words and n-grams. In the bag of words model the features of a document is represented by a set of boooleans. Where the value of each boolean represent whether a word exist or not in the document. The bag of words models can be improved to include the number of times a word exist, however it remains a very simple representation as the order of the words is lost. The n-gram representation does not consider each word on their own but the n following words. This representation can increase the accuracy but, as it is harder to implement, is beyond the scope of this paper (Manning et al. 2009). Not all words in a text have a emotional meaning and are therefore not interesting 7 CHAPTER 2. BACKGROUND when doing sentiment analysis. These words are known as stopwords. A lexicon method would ignore these words by assigning the value zero to them in its calculations. But a machine learning algorithm may try to find patterns with these words. Preprocessing the data by removing all stopwords may improve the accuracy of machine learning algorithms. As it hinders the machine learning algorithms from finding rules based on these words. Rules which would generate false result as they are based on coincidences rather than real patterns in natural language. 2.5 Data The subject of sentiment analysis gained speed in the early 2000s with the rise of the internet. The data used in this early history of the subject was mainly reviews collected from websites aggregating them (Pang & Lee 2008, p. 4). The availability and structure of data is important and will be presented in this section. 2.5.1 Social media With the explosion of available data following the use of social media likewise the opportunity to mine this data has also increased. However the nature and content of the data is different from other sources of natural language texts, reviews, news stories, scientific articles or books. The language is more simple and gramaticly worse than standard text(Baldwin et al. 2013). 2.5.2 Datasets Several datasets have been created by different organizations and universities to be used in research and experimentation in sentiment analysis. Three datasets are presented here. The datasets are extracted from the micro-blogging service Twitter and the Internet Movie DataBase (IMDB). Twitter Sentiment140 Sentiment140 is a twitter dataset compiled at Stanford university using tweets (1140 character documents) which have been machine-annotated using an emoticon heuristic(Go et al. 2015). This heuristic assumes that emoticons such as ”:)” indicates a positive sentiment and ”:(” indicates a negative sentiment. The data is divided into a training and testing part. The training part consists of 1 600 000 tweets with half being annotated as positive and the other half as negative. The test set is 359 tweets where 182 is positive and 177 negative. Sanders analytics twitter corpus Sanders analytics twitter corpus or what this paper will shorten as ”Sanders” 8 CHAPTER 2. BACKGROUND (Sanders 2011) is a small dataset of 5513 tweets sorted in sentiments of ”Positive”, ”Negative”, ”Neutral” and ”Irrelevant”. Approximately 1000 of the tweets in the set is labeled ”Positive” or ”Negative” which is useful for the purpose of this study. The data was acquired by Sanders Analytics by downloading tweets from specific hashtags using the twitter API. All the tweets are annotated manually by humans. IMDB Large Movie Review Dataset Large Movie Review Dataset 1.0 is a dataset of IMDB (Internet Movie DataBase) reviews with accompanying sentiment (Maas et al. 2011). The dataset consists of 50 000 reviews where half of them is purposed for training and the other half for testing. Both the training and testing set is split evenly with positive and negative sentiment. The data is classified using IMDBs review score system where a negative review got a score of Æ 4 of 10 and a positive review Ø 7 of 10. 2.6 Tools Tools and libraries for research in sentiment analysis already exists and the aim of this sections is to present some of these tools. 2.6.1 NLTK NLTK or “Natural Language Toolkit” is a toolkit built with python that is used as a framework for general natural language processing needs (Natural Language ToolKit 2016). It can be used for its naive bayes classifier and for removing stopwords during data preprocessing. 2.6.2 Scikitlearning Scikit-learn is a toolkit that consists of different machine learning algorithms that can be used in data science and sentiment analysis (scikit-learn 2014). It is implemented in python and built on other frameworks like NumPy, SciPy and matplotlib. It contains classifier algorithms such as Support Vector Machine and deciscion tree. 2.6.3 AFINN AFINN is a tool that is used to acquire sentiment score for text using the lexical based approach (Nielsen 2011)(Nielsen 2016). AFINN uses a human compiled sentiment lexicon. It uses this lexicon to classify text with sentiment. 9 Chapter 3 Method The approach of the method of the study is divided into two parts: Formatting of the data and the procedure of the testing to acquire the accuracy results. This chapter will describe these processes. 3.1 Data formatting 3.1.1 Data requirements One of the most important elements to this study is the problem of acquiring data that can be used. To be able to answer the questions posed in the problem statement we need to have data that adhere to some requirements: • The data from all the databases must be in a format or can be formatted in such a way so that the tools can read it. This is needed so that methods accuracy can be compared between the databases. • The data needs to be annotated with a sentiment of positive and negative so that the accuracy of the methods can be determined. The datasets presented in the backround all adhere to these requirements, which is why they were chosen. 3.1.2 Formatting The datasets presented was acquired in different file types and formats. To implement the tools to all the different data would take too much time and code for this study. This creates the need for the data to have a common format. The decided on format was in the form of Comma Separated Values (CSV) files. Python can handle csv files and this format can be transformed to the tools different input specifications. The common format was structured with the tweet/review text and the sentiment of said text like the example below. The data was extracted and 10 CHAPTER 3. METHOD reformatted with python code and bash shell scripts to save time. “Sentiment”, “Document text” This format is all that the tools need to be able to get accuracy out of the algorithms. 3.1.3 PreProcessing To be able to see if different characteristics of the text made an impact to the results some preprocessing was done for the machine learning approach. The lexicon approach was expected to read the text as it is and with no difference to the results. The preprocessing done was: • Everyting was converted to lower-case. This is done to avoid the same word being interpreted differently. • Removal of hash character ”#” using regular expressions. This processing was done so that words such as ”#amazing” would turn into ”amazing” otherwise they would be interpreted as different words. • Substitution of the usernames. Usernames in twitter begin with an ”@”character. All words beginning with this character was replaced with the string ”USERNAME”. Usernames should not alter the sentiment of the text and were therefore removed. • Removing stopwords from the text. This was done using the stopword dataset supplied by NLTK. 3.2 Testing procedure The approaches were implemented differently. To be able to replicate the study and understand some of the consequences of the results all the versions of the tools used are presented in the table 3.1. Python NLTK AFINN scikit-learn NumPy Version number 2.7.11 3.2.1 GitHub commit: 720367d 0.17.1 1.11.0 Table 3.1. Version numbers of the tools used. 11 CHAPTER 3. METHOD 3.2.1 Lexicon The lexicon based approach used the tool for sentiment scoring called AFINN(Nielsen 2011)(Nielsen 2016). It was used in a previous thesis study for its ease of use and lesser need of implementation (Sommar & Wielondek 2015). It was used in this study for the same reasons. AFINN has a function for scoring the sentiment of a text. This function returns if the supplied text is positive or negative on a scale from -5 to 5. All texts with the values Æ-1 were annotated negative and the texts with values Ø1 were annotated positive. The scoring function was used on all the separate tweets or reviews in the test sets to get a negative or positive sentiment. In the sanders dataset the whole dataset of 1091 tweets were scored instead of only partially. The sentiment result from AFINN was then compared to the actual sentiment to calculate the accuracy of the method depending on dataset. See figure 3.1 below. Figure 3.1. An illustration of how the accuracy was acquired for the lexicon based approach. 3.2.2 Machine learning The machine learning based approach used the toolkits NLTK and scikit-learn for scoring the sentiment of a text. As described in the background each classifier algorithm needs to be trained before being tested and used. We trained the Naive bayes classifier, SVM and decision tree with the training data. Both the Sentiment140 and IMDB dataset had training data for the machine learning approach bundled with the test data but with the sanders dataset another approach had to be performed. For the sanders database the tweets were shuffled and then split into 1000 tweets for training data and 91 tweets for testing data, this was repeated 20 times to get 12 CHAPTER 3. METHOD an average accuracy. The trained classifiers was then used on the testing data to acquire sentiment from the text. This was performed on each dataset with the corresponding classifier. The repeating of the training and testing on the sanders dataset was needed because the dataset was much smaller and yielded large differences in results with each repetition. The extracted sentiment was then compared to the actual sentiment of each dataset and the accuracy could be calculated and retrieved. For each dataset all three classifiers were trained and then tested on the testing data for the same dataset. See figure 3.2 below. Figure 3.2. An illustration of how the accuracy was acquired for the machine learning based approach. 13 Chapter 4 Results In table 4.1 to 4.3 the accuracy of the different machine learning methods with different levels of preprocessing is shown. The result from the decision tree method is incomplete due to unreasonable computing time required. Table 4.4 shows the results from the lexicon method. Lastly figure 4.1 compares the best result for each method. Sentiment140 Sanders IMDB Naive Bayes 80.45% 70.05% 80.35% SVM 77.37% 66.48% 78.26% Decision Tree No data 63.79% No data Table 4.1. The accuracy of the machine learning methods with no processing. The Naive Bayes method achieve the highest accuracy on all datasets, with accuracy between 70-80%. Significant lower accuracy is found for all methods when the Sanders dataset is used. Sentiment140 Sanders IMDB Naive Bayes 78.83% 71.15% 80.89% SVM 76.88% 66.21% 77.94% Decision Tree No data 63.74% No data Table 4.2. The accuracy of the machine learning methods with the twitter syntax removed Similar results as table 4.1, Naive Bayes is the best performing method for all datasets. The removal of twitter syntax have little impact, mostly increasing the accuracy with less than 1%. For some methods and datasets the accuracy decreases. 14 CHAPTER 4. RESULTS Sentiment140 Sanders IMDB Naive Bayes 81.56% 71.18% 80.29% SVM 79.05% 67.91% 78.00% Decision Tree No data 62.80% No data Table 4.3. The accuracy of the machine learning methods with the twitter syntax removed as well as stopwords Similar results as table 4.1, Naive Bayes is the best performing method for all datasets. The removal of stopwords increases the accuracy in the test on the Sentiment140 dataset with 2-3% while giving either a boost or loss of approximately 1% for the other datasets (compared to table 4.2). Sentiment140 Sanders IMDB Lexicon 76.87% 84.38% 69.67% Table 4.4. The accuracy of the lexicon method The result from the lexicon method show that it performs best on the Sanders dataset and worst on the reviews from IMDB. Figure 4.1. dataset Graphical represantion of the best result for each method on each 15 Chapter 5 Discussion 5.1 Machine learning As seen by comparing the lexicon methods performance on the reviews from IMBD table 4.4 with the results of machine learning methods the machine learning algorithms perform better than the lexicon method on the reviews from IMDB. To preprocess the data by removing twitter specific syntax and stopwords give in most cases some improvement to the machine learning algorithms. With the exception of the decision tree method and the IMDB dataset the best accuracy is found in table 4.3. It should be easier to find the patterns which have sentimental meaning if white noise is removed, which is reflected in our results. Why the decision tree is different is hard to explain. If all algorithms behaved like that one could suggest the hashtags and the stopwords actually contain information, but as it is only one algorithm on a, rather small, dataset it should be treated more as an anomaly rather than anything else. Overall the machine learning algorithms can be configured to perform better, even if the difference is quite small. 5.2 Lexicon The lexicon method is almost as good on the Sentiment140 dataset as machine learning and outperforms all machine learning methods on the Sanders database. One possible explanation of why the lexicon method performs well on the twitter datasets would be that the language in these is more compact and simple than a movie review. The fact that the language used in social media may be far from correct may be a problem if a lexicon method is used. Unlike machine learning methods which finds patterns in any data, a lexicon method is constructed to understand a language. This could be a problem if the language in the data is different from what the lex16 CHAPTER 5. DISCUSSION icon method is constructed to classify. However this can not be concluded to be a problem with a lexicon method from the result as the lexicon method performs well. As seen in 4.4 the lexicon method performs wastly different on the two twitter datasets, almost a 10 percentage difference. While the machine learning methods do perform better when trained with a larger set, there is no clear reason why a lexicon method would perform like this. One possible explanation would be that the datasets differ in more ways than size. Even if both datasets consist of tweets they have been annotated differently, Sanders by hand and Sentiment140 automatically. There is a possibility that this is the source of the difference. The lexicon method disagrees with the program which annotated Sentiment140 but agrees with the human interpretation of sentiment in tweets. However a very likely source of this difference could be the natural distribution of accuracy of a method on different datasets. As the Sanders dataset is small an accidental extreme result could be possible. 5.3 Implementation The great strength of machine learning is that if presented with more data, it will perform better. Our result reflect this as the larger database Sentiment140 gives better result to all machine learning algorithms compared to the smaller Sanders database. Increasing the amount of data to improve the machine learning approaches comes at a price, as computing time and amount of memory needed increases. This is especially true if the NLTK is used as it is far from being effective in either aspect. Since NLTK is written in python it is slow and ineffective which poses a real problem. During the work several of the tests failed because of memory errors with the Windows distribution of NLTK. And since the tests were time consuming the cost of testing many configurations quickly climbed. The machine learning approach of constructing decision trees had to be omitted because it took too long to be constructed with the tools at hand and the size of the data. There are other toolkits and implementations available written in other languages which most likely are more effective and more suited for large scale computations. However these toolkits may prove harder to implement and in the end not be worth the time. As we found NLTK to be widely used in research we assumed it to be representative of the algorithms even if NLTK may not be the best tool to process large sets of data. 17 CHAPTER 5. DISCUSSION 5.4 Applications and future research Our result show that the lexicon based method have potential in the field of sentiment analysis on social media and should not be underestimated. It could therefore be used in real applications and should be subject for further research. For example analyzing more datasets with more implementations as datasets from other sources within social media and different lexicons probably would give other results. In our study we did not compare runtime as a variable but for the practicality and interest this should also be researched. 18 Chapter 6 Conclusion Switching source of data from reviews to social media do affect the accuracy of sentiment analysis methods. A lexicon approach shows potential as it performs better at social media datasets compared to machine learning methods. Possibly due to the compact nature of the social media data. The accuracy of the machine learning methods can be improved through preprocessing but the improvement is small. More significant gain in accuracy is achieved by increasing the size of the training dataset. 19 Bibliography Baldwin, T., Cook, P., Lui, M., MacKinlay, A. & Wang, L. (2013), How noisy social media text, how diffrnt social media sources? URL: http://cs.unb.ca/ ccook1/ijcnlp2013-socmed.pdf Go, A., Bhayani, R. & Huang, L. (2015), ‘Sentiment140’. URL: http://help.sentiment140.com/for-students/ Hindersson, T. & Lousseief, E. (2015), ‘Smirking or smiling smileys? - evaluating the use of emoticons to determine sentimental mood’. URL: http://www.diva-portal.org/smash/get/diva2:811037/FULLTEXT01.pdf Hoschs, W. L. (2016), ‘Machine learning’. URL: http://global.britannica.com/technology/machine-learning Joachims, T. (1998), ‘Text categorization with support vector machines’. URL: http://www.cs.cornell.edu/people/tj/publications/joachims9 8a.pdf Maas, A. L., Daly, R. E., Pham, P. T., Huang, D., Ng, A. Y. & Potts, C. (2011), Learning word vectors for sentiment analysis, in ‘Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies’, Association for Computational Linguistics, Portland, Oregon, USA, pp. 142–150. URL: http://www.aclweb.org/anthology/P11-1015 Manning, C. D., Raghavan, P. & Schütze, H. (2009), An Introduction to Information Retrieval, Cambridge University Press, Cambridge, England. Medhat, W., Hassan, A. & Korashy, H. (2014), ‘Sentiment analysis algorithms and applications: A survey’. URL: http://www.sciencedirect.com/science/article/pii/S2090447914000550 Natural Language ToolKit (2016). URL: http://www.nltk.org/ Nielsen, F. Å. (2011), A new ANEW: evaluation of a word list for sentiment analysis in microblogs, in M. Rowe, M. Stankovic, A.-S. Dadzie & M. Hardey, eds, ‘Proceedings of the ESWC2011 Workshop on ’Making Sense of Microposts’: Big 20 BIBLIOGRAPHY things come in small packages’, Vol. 718 of CEUR Workshop Proceedings, pp. 93– 98. URL: http://ceur-ws.org/Vol-718/paper1 6.pdf Nielsen, F. Å. (2016), ‘Afinn sentiment analysis in python’. URL: https://github.com/fnielsen/afinn Pang, B. & Lee, L. (2008), ‘Opinion mining and sentiment analysis’. URL: http://www.cs.cornell.edu/home/llee/omsa/omsa.pdf Pang, B., Lee, L. & Vaithyanathan, S. (2002), Thumbs up? sentiment classification using machine learning techniques, pp. 79–86. URL: http://www.cs.cornell.edu/home/llee/papers/sentiment.pdf Sanders, N. J. (2011), ‘Sanders-twitter sentiment corpus’. URL: http://www.sananalytics.com/lab/twitter-sentiment/ scikit-learn (2014). URL: http://scikit-learn.org/stable/index.html Sommar, F. & Wielondek, M. (2015), ‘Combining lexicon- and learning-based approaches for improved performance and convenience in sentiment classification’. URL: http://www.diva-portal.org/smash/get/diva2:811021/FULLTEXT01.pdf Tong, S. & Koller, D. (1998), ‘Support vector machine active learning with applications to text classification’, p. 1. URL: http://ai.stanford.edu/ koller/Papers/Tong+Koller:ICML00.pdf ZackWeinberg (2012), ‘Svm separating hyperplanes’. URL: https://commons.wikimedia.org/wiki/File:Svm_separating_hyperplanes_(SVG).svg 21 www.kth.se
© Copyright 2026 Paperzz