Word Sense Disambiguation of Opinionated Words Using Extended Gloss Overlap Bernadette Rosario C. Razon Charibeth K. Cheng De La Salle University [email protected] De La Salle University [email protected] ABSTRACT polarity classification. Using scores found in SentiWordNet[6], the system takes the average score of each of the synsets of a word and chooses the polarity with the higher score as the overall polarity of the word. The scores are then used for both sentencelevel and commentary-level classification. This approach, although straightforward, only yields an accuracy of 50.5%. One probable reason is that the system does not take into consideration the context in which a term belongs to. The increasing use of digital technology and social media has prompted government organizations to take advantage of these in order to gather feedback and opinions from the general public on policies and laws. Since people’s opinions vary, there is a need to present these in an orderly manner for faster and more efficient decision and policy-making. This is where opinion classification comes in. Existing systems such as VoxPop [5] are able to classify texts according to polarity with the help of SentiWordNet, which assigns subjectivity and polarity scores to WordNet synsets. However, the current algorithm that VoxPop uses yields an accuracy rate of only 50.5%. Word sense disambiguation (WSD) is the task of “identifying the meaning of words in context in a computational manner” [12]. It is often used in improving tasks such as information retrieval, part-of-speech tagging and machine translation. Some algorithms in word sense disambiguation involve comparing a target word to the words surrounding it, while others are example-based. This research integrated a word sense disambiguation (WSD) algorithm may be the key to improving the task of polarity classification. The extended gloss overlap algorithm was used, which compares two synsets and scores the relatedness between them by looking for phrasal matches between all their glosses. With the integration of WSD into VoxPop, its accuracy rating increased to 60%. The rest of the paper is organized as follows: Section 2 discusses the literature related to the study. Section 3 presents the methodology of the research, including the testing methods. Section 4 presents the results and analysis, while Section 5 presents recommendations for further research of the topic. Keywords 2. RELATED LITERATURE 2.1 Polarity Classification Word Sense Disambiguation, Sentiment Analysis The work described in [19] makes use of a list of seed words in order to determine the polarity of other words. They make use of a modified log-likelihood ratio to determine the polarity of the target word based on its collocation frequency with the seed words in the input sentence. The resulting scores of the words are then averaged in order to determine the polarity of the opinion sentence. 1. INTRODUCTION With the ever-increasing use of technology – especially social media, it is not at all surprising that information is exchanged at a much faster pace. The amount of data being shared especially through the World Wide Web is not showing any signs of decreasing; conversely, the increasing presence of various platforms such as social networking sites and online forums provide more venues for exchanging ideas and opinions. On the other hand, a subjectivity lexicon comprising of words tagged as negative, positive and neutral was used in the work described in [9]. They employ a rule-based algorithm based on the notion that sentiment-bearing sentences follow certain structures and that these structures can be encapsulated into a number of rules. Chunked text is taken as input and rules are fired using a cascade of transducers, proceeding from word- to phraselevel and finally to sentence-level polarity classification. However, it is also because of the increasing volume of data in the Web that the need for a structured and organized presentation of certain data becomes highly relevant. For example, there may be numerous online reviews of a newly-released product and a potential buyer is interested in reading only positive reviews. It would be tedious to read through so many articles just to see whether these are positive or not. It would be beneficial if these types of data are automatically classified into its proper polarity so as to save time and effort. VoxPop is “a web-based opinion detection and classification system which organizes, analyzes and manages English commentaries” [5]. Akin to online forums, the system accepts as input comments on certain topics. It then detects the opinions and classifies them according to their polarity, and further clustered according to subtopics. For the opinion classification submodule, VoxPop makes use of SentiWordNet to identify word polarity. SentiWordNet was created using semi-supervised synset classification to assign 3 scores – Objective, Positive and Negative – to determine the polarity of the synset. Scores were derived using eight ternary classifiers [6]. Polarity classification aims to determine whether a given text has a positive, negative, and in some cases a neutral stance in relation to a particular issue or topic. It can be done in the word, sentence, or even document-level. Generally, algorithms make use of existing lexicons or word sets for word-level polarity to aid in classifying the polarity in higher levels. Although there has been much work involving polarity classification, improvements can still be done. An example work is VoxPop [5], which detects and classifies opinionated parts of commentaries by polarity. It employs a three-tiered approach to 2.2 Word Sense Disambiguation In the case of the Lesk algorithm [11], the dictionary definitions of two terms are compared and the appropriate sense for each 1 Proceedings of the 8th National Natural Language Processing Research Symposium, pages 1-5 De La Salle University, Manila, 24-25 November 2011 if a single annotator’s bias has a significant effect on the accuracy, the system was also tested on another set of inter-annotated data. The second set consists of 474 commentaries annotated by three individuals. word is determined by the frequency of common words between the definitions. This algorithm is said to be responsible in paving the way for more research on word sense disambiguation [13]. Another algorithm, the extended gloss overlap [4], has its foundations on the original Lesk algorithm. However, the algorithm gives a higher score to phrasal matches instead of treating glosses as a bag of terms regardless of word order. Furthermore, the words compared are not limited to those present in the input text. Glosses of related words obtained through WordNet also play a role in determining the sense of a term. To examine the effect of integrating WSD in the opinion classification task, three types of tests were done on each data set. The first type of test was conducted using the VoxPop system without the WSD algorithm integrated. The second type used the VoxPop system with the extended gloss overlap. However, this test only included the main gloss of each content word in the data. The third type used the VoxPop system integrated with the extended gloss overlap algorithm, this time including the glosses of related synsets. The gloss vector algorithm also compares the glosses of two terms or concepts. One distinct feature of the algorithm is that it uses a co-occurrence matrix representing the frequency of any two words appearing in a WordNet gloss. The glosses of the two terms to be compared are represented through vectors, and for each term the algorithm outputs a vector representing the overall concept of the term. Also, each set of tests on the system used different combinations of parts of speech. The groupings are as follows: adjectives and adverbs; adjectives, adverbs and nouns; adjectives, adverbs and verbs; and lastly adjectives, adverbs, nouns and verbs. Adjectives and adverbs were retained in all four groups because these form the basic components of an opinionated phrase or sentence. An algorithm called the Chain Algorithm for Disambiguation (CHAD) [15] also compares the glosses of words. In this case words are disambiguated in triplets such that the sense of the 3rd word is determined with the help of the senses of the two words before it. The algorithm, however, does not consider the senses of succeeding words. Furthermore, the accuracy of the resources used to determine word polarity was also examined. For each type of test run on the system, SentiWordNet 1.0 and SentiWordNet 3.0 were used alternately to see if the improved accuracy of SentiWordNet 3.0 [3] will also improve overall results. The work in [17] describes an algorithm for disambiguating Chinese adjectives. It uses pointwise mutual information to compute word association strength between nouns and adjectives in the training data, and uses the results to disambiguate the adjectives. It achieved very high precision but very low recall. 4. RESULTS AND ANALYSIS 4.1 Linguist-evaluated data The succeeding 4 tables contain the results of the tests performed using the original data compiled by the VoxPop proponents. A task called subjectivity word sense disambiguation is described in [2]. Using machine learning features for WSD, terms are labeled as subjective or objective depending on how it is used in a sentence, and whether it has an impact on the overall polarity of a sentence. However, it did not make a significant improvement compared to the original classifier in terms of accurately labeling sentence-level polarity. Table 1. Results on VoxPop data using adjectives and adverbs Algorithm SWN1 SWN3 No WSD 46.46% 48.48% Extended Gloss Overlap without 46.97% 48.48% Related Synsets Extended Gloss Overlap with 46.97% 46.46% Related Synsets 3. METHODOLOGY The research focused on integrating word sense disambiguation (WSD) into the opinion classification task, and seeing how this will affect over-all results. The word sense disambiguation algorithm used was the extended gloss overlap described in [4] because it had the highest accuracy in disambiguating terms as compared to other similar WSD algorithms [5]. Table 2. Results on VoxPop data using nouns, adjectives and adverbs Algorithm SWN1 SWN3 No WSD 43.94% 45.45% Extended Gloss Overlap without 45.45% 48.99% Related Synsets Extended Gloss Overlap with 43.43% 49.49% Related Synsets The algorithm was integrated into VoxPop, a system that detects opinions in commentaries and classifies these by topic as well as polarity. The original VoxPop algorithm makes use of adjectives and adverbs in determining a commentary’s polarity. However, this research also made use of nouns and verbs in the classification task as well as in the word sense disambiguation algorithm. This is because certain nouns (e.g. honesty) and verbs (e.g. agree) also lean towards a certain polarity. Furthermore, only the polarity scores of the word sense identified by the WSD algorithm are used. Table 3. Results on VoxPop data using verbs, adjectives and adverbs Algorithm SWN1 SWN3 No WSD 46.46% 46.97% Extended Gloss Overlap without 45.45% 47.47% Related Synsets Extended Gloss Overlap with 42.93% 50.00% Related Synsets Two sets of data were used. Both data sets consist of commentaries taken from the Inbox World section of The Philippine Star. The first set of data consists of 200 commentaries that were used to evaluate the original VoxPop system. These commentaries were primarily evaluated by a linguist and each commentary was manually classified as being positive, negative or neutral. The linguist’s evaluation was then compared with the results yielded by the system in the different tests. In order to see Table 4. Results on VoxPop data using nouns, verbs, adjectives and adverbs Algorithm SWN1 SWN3 2 No WSD Extended Gloss Overlap without Related Synsets Extended Gloss Overlap with Related Synsets 41.92% 41.92% 44.44% 45.96% 45.45% 45.45% Table 8. Results on unanimously classified data using verbs, adjectives and adverbs Algorithm SWN1 SWN3 No WSD Extended Gloss Overlap without Related Synsets Extended Gloss Overlap with Related Synsets Generally, the use of SentiWordNet 3.0 yielded better results, as compared to the use of SentiWordNet 1.0. This observation can be seen across many combinations of word sense disambiguation algorithms. Since SentiWordNet 3.0 used disambiguated WordNet glosses in determining polarity scores, the resource as a whole is said to be 20% more precise as compared to its predecessor [3]. However, even this generalization cannot be trusted completely because of the highly varying degrees of difference which range anywhere from 0% to more than 7%. 59.71% 54.68% 57.55% 55.40% 49.64% 52.52% Table 9. Results on unanimously classified data using verbs, nouns, adjectives and adverbs Algorithm SWN1 SWN3 No WSD Extended Gloss Overlap without Related Synsets Extended Gloss Overlap with Related Synsets 4.2 Inter-annotated data The purpose of evaluating the commentaries by inter-annotator agreement is to see how different people view similar commentaries. 60.43% 57.55% 63.31% 60.43% 53.24% 54.68% There is a slight improvement in the accuracy as compared to the results of the tests run on the VoxPop data. However, the improvement is not significant. Table 5 gives a summary of the annotators’ evaluation. Table 5. Statistics on Annotators' Evaluation Number of commentaries with completely 80 different classification Number of commentaries with at least 2 same 394 polarity classification Number of commentaries with 3 same polarity 139 classification 4.3 Factors Although using SentiWordNet 3.0 yielded better results as opposed to using SentiWordNet 1.0, the overall accuracy of the system across all combinations of algorithms and parts-of-speech is quite low. The following were identified as factors that contributed to the low improvement: Of the 474 commentaries that were evaluated, only 139 or 29.32% were tagged with the same polarity by the annotators. On the other hand, 394 commentaries were tagged with the same polarity by at least two annotators. That left 80 commentaries, which were tagged with different polarities by each annotator. This shows that it is difficult for humans to unanimously concur on a commentary’s polarity. 4.3.1 Presence of highly polarized words For a direct comparison of the results in the linguist-annotated data and the inter-annotated data, only the results for the unanimously inter-annotated data are considered. Tables 6 until 9 show the results for tests ran using the inter-annotated data, wherein each table shows results of a part-of-speech group. The linguist classified this commentary as positive, while system has classified it as negative. This result was consistent in all combinations of part-of-speech groups, algorithms and SentiWordNet resources. The fault lies in the polarity of the word relentless because it has a high negative score in both SentiWordNet 1.0 and 3.0. Certain terms in a commentary may have polarities that are too high, rendering the polarities of other words insignificant. Take the following statement as an example: “Maybe this could be attributed to Chief Justice Puno’s relentless efforts in cleansing the ranks of the judiciary.” Table 6. Results on unanimously classified data using adjectives and adverbs Algorithm SWN1 SWN3 No WSD Extended Gloss Overlap without Related Synsets Extended Gloss Overlap with Related Synsets 53.24% 54.68% 53.96% 55.40% 53.96% 55.40% 4.3.2 Presence of contextual valence shifters Words such as not, too, should and although change the polarity of the words next to them. However, this is not considered by the VoxPop formula. In the VoxPop data, most commentaries classified as neutral by the linguist are classified as either positive or negative by the system. The statement “Integrity and honesty do not translate to good governance in solving our country’s economy and complex society full of backbiters.” Table 7. Results on unanimously classified data using nouns, adjectives and adverbs Algorithm SWN1 SWN3 No WSD Extended Gloss Overlap without Related Synsets Extended Gloss Overlap with Related Synsets 53.96% 57.55% 57.55% 60.43% 55.40% 56.83% was classified by the linguist as being neutral. The system classified it as negative. In the statement, the term not shifts the polarity of the whole sentence. In this case, the term should only influence the succeeding word or the phrase that contains it. 4.3.3 Accuracy of resources used The resources used also have their own deficiencies such as words tagged with the incorrect part-of-speech, as well as terms and/or particular senses of terms having no equivalent score in SentiWordNet. 3 taking the polarities of individual terms may prove to be more accurate, especially if the presence of contextual valence shifters is taken into account. This is because the valence shifters can change the polarity of the word immediately next to it, or even a group of words depending on its type. 4.3.4 Inaccuracy of the WSD algorithm There are also instances where the word sense disambiguation algorithm assigned the incorrect sense to a word. For example, in the sentence “That is why we need a leader, a president with spiritual and moral aptitude, who will influence Filipinos by the way he lives.” Another recommendation is to tackle sentiment classification based on the relationship of commentaries to a hook question or thread title. The current classification of positive, negative and neutral can be ambiguous in some topics. Take the question “Should the death penalty be reinstated?” as an example. A commentary stating something like “The death penalty should be reinstated because it can potentially lessen the crime rate in the Philippines.” can be interpreted as positive by someone who agrees with it. However, for someone who is against the death penalty this commentary is negative. In other words, the polarity of a commentary can be highly relative to an individual’s beliefs and values. It would make more sense if a commentary is classified based on its agreement to the given topic. the term spiritual was tagged with its 4th sense. The table below shows the different glosses of spiritual as an adjective: Sense 1 2 3 4 Table 10. Glosses for the term "spiritual" Gloss concerned with sacred matters or religion or the church concerned with or affecting the spirit or soul lacking material body or form or substance resembling or characteristic of a phantom Seeing the glosses, it is more appropriate to tag the term spiritual with its second sense since its gloss is more in line with what the statement wants to imply. The last recommendation is adding domain knowledge in the opinion classification task. Since the SentiWordNet resources were made for use in multiple domains, certain words can have different polarities when contextualized to a specific domain. 5. RECOMMENDATIONS 6. REFERENCES The research can be improved in many aspects. One of these is the creation of resources to help process commentaries that are written in both English and Tagalog. A good number of the data uses Filipino words either as part of an English sentence or in entire sentences altogether. The creation of a Filipino WordNet or even a Filipino SentiWordNet and using these together with their English counterparts may help increase the accuracy of the opinion classification task, and can eventually be used in other Natural Language Processing tasks as well. [1] Agarwal, A., Biadsy, F., & McKeown, K. (2009). Contextual Phrase-Level Polarity Analysis using Lexical Affect Scoring and Syntactic N-grams. Proceedings of the 12th Conference of the European Chapter of the ACL (pp. 24-32). Athens, Greece: Association for Computational Linguistics. [2] Akkaya, C., Wiebe, J., & Mihalcea, R. (2009). Subjectivity Word Sense Disambiguation. Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (pp. 190-199). Singapore: Association for Computational Linguistics. In creating the sentiment resource, the three-type scoring concept of SentiWordNet can be employed as this seems to be more flexible. However, the scores of each word should differ or be adjusted based on the domain they are used in. Also, the word list should include both English and Filipino terms used in commentaries and other forms of media. Most importantly, even if the scores may be automatically or semi-automatically generated, there must be some sort of validation by at least three experts of a particular domain. This is to ensure that the scores to be used per word are more or less considered correct. [3] Baccianella, S., Esuli, A., & Sebastiani, F. (2010). SENTIWORDNET 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining. Proceedings of the 7th Conference on Language Resources and Evaluation (LREC'10) (pp. 2200-2204). Valetta, Malta: European Language Resources Association (ELRA). [4] Banerjee, S., & Petersen, T. (2003). Extended Gloss Overlaps as a Measure of Semantic Relatedness. Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence, (pp. 805-810). Acapulco, Mexico. Another recommendation is using WordNet 3.0 in testing the effectiveness of the WSD task as well as the opinion classification task. Since the creation of SentiWordNet 3.0 used glosses of WordNet 3.0, the rate of finding the polarity of a certain term or sense in SentiWordNet 3.0 may increase. Also, the senses assigned to words in the WSD task may be more appropriate. [5] Bautista, G. Z., Garcia, M. A., & Tan, R. J. (2010, September 1). VoxPop: Automated Opinion Detection and Classification with Data Clustering. Manila, Philippines. The significance of contextual valence shifters [14] in a commentary should also be studied. Apart from negatives such as not and neither, valence shifters also include intensifiers (e.g. rather, deeply) and modal operators (e.g., could, should), which can signal that succeeding words are not necessarily opinions but rather suggestions. There are also presuppositional items (e.g. She made the cut. vs. She barely made the cut.) and connectors (e.g. Although Rose is very smart, she is quite lazy.). These are aspects of speech that are not covered by the current VoxPop formula since it does not take into account how combinations of words can have a different meaning as opposed to when they’re taken as separate lexical items. [6] Esuli, A., & Sebastiani, F. (2006). SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining. Proceedings of the 5th Conference on Language Resources and Evaluation, (pp. 417-422). [7] Fellbaum, C. (Ed.). (1998). WordNet: An Electronic Lexical Database. Cambridge, MA: MIT Press. [8] Howes, D. (2001). e-Legislation: Law-Making in the Digital Age. McGill Law Journal, 47(1), 39-57. [9] Klenner, M., Petrakis, S., & Fahrni, A. (2009). Robust Compositional Polarity Classification. Proceedings of the International Conference on Recent Advances in Natural Language Processing 2009 (pp. 180-184). Borovets, Bulgaria: Association for Computational Linguistics. Modifications in the VoxPop formula should also be considered. Computing polarities by phrases or clauses instead of 4 International Conference on Knowledge Engineering, Principles and Techniques, (pp. 41-49). Cluj-Napoca. [10] Kolte, S. G., & Bhirud, S. G. (2008). Word Sense Disambiguation Using WordNet Domains. 2008 First International Conference on Emerging Trends in Engineering and Technology (pp. 1187-1191). Los Alamitos, California: IEEE Computer Society. [16] Wawer, A. (2010). Is Sentiment a Property of Synsets? Evaluating Resources for Sentiment Classification using Machine Learning. Proceedings of the Seventh Conference on International Language Resources and Evaluation (pp. 1101-1104). Valleta, Malta: European Language Resources Association (ELRA). [11] Lesk, M. (1986). Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. Proceedings of the 5th Annual International Conference on Systems Documentation (pp. 2426). New York: Association for Computing Machinery. [17] Wu, Y., Wang, M., & Jin, P. (2008). Disambiguating Sentiment Ambiguous Adjectives. 2008 International Conference on Natural Language Processing and Knowledge Engineering (pp. 1-8). Beijing: Institute of Electrical and Electronics Engineers. [12] Navigli, R. (2009). Word Sense Disambiguation: A Survey. 41(2). [13] Pedersen, T., Banerjee, S., & Patwardhan, S. (2005). Maximizing Semantic Relatedness to Perform Word Sense Disambiguation. Minneapolis: University of Minnesota Super Computing Institute. [18] Yarowsky, D. (1993). One sense per collocation. Proceedings of the workshop on Human Language Technology (pp. 266--271). Princeton, New Jersey: Association for Computational Linguistics. [14] Polanyi, L., & Zaenen, A. (2004). Contextual Valence Shifters. Proceedings of AAAI Spring Symposium on Exploring Attitude and Affect in Text (pp. 106-111). California: Stanford University. [19] Yu, H., & Hatzivassiloglou, V. (2003). Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying the Polarity of Opinion Sentences. Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing (pp. 129-136). Morristown, NJ, USA: Association for Computational Linguistics [15] Tatar, D., Serban, G., Mihis, A., Lupea, M., Lupsa, D., & Frentiu, M. (2007). A Chain Dictionary Method for Word Sense Disambiguation and Applications. Proceedings of the 5
© Copyright 2026 Paperzz