2009 Eighth International Symposium on Natural Language Processing Who Speaks for Whom? Towards Analyzing Opinions in News Editorials Bal Krishna Bal and Patrick Saint-Dizier o unnecessarily have to go through all of them, yet get a vivid picture of the happenings or events. Even better would have been the case, if there were a mechanism to track changes in opinion across editorials over a common topic with time. The proposed work aims to build a framework and more precisely a computational linguistic model that would suggest appropriate techniques and methods for analyzing the editorials and constructing a synthesis. At the moment, we have basically identified the different linguistic components and are in the process of working towards specifying the different underlying computational procedures required for the model. The organization of the paper is as follows. In section I, we introduce our problem, state the research aims and briefly talk about the current status of the work. In section II, we shed light on Opinion Mining and discuss on the different sub problems under the larger problem. We also correlate the association of these sub problems with our problem of editorial analysis and synthesis. In section III, we give an overview of the related works and also throw light on the novelties that our work carries. Moving on to section IV, we discuss on the linguistic basis for distinguishing facts and opinions. In section V, we talk on the linguistic aspects for determining the strength of opinions. Similarly in section VI, we throw light on one of the crucial components of our research work – outlining the argumentation structure of editorials (support and rhetorical relations). We also briefly discuss on the semantic tagset employed for the purpose of annotation. In section VII, we report our ongoing works on editorials collection and annotation. Abstract—The paper discusses on the ongoing work of editorial analysis and synthesis construction, basically text annotation and the linguistic criteria for distinguishing between facts and opinions. Further we also talk on the factors playing a crucial role in determining the strengths of opinions. We also discuss on the process of argumentation structure outlining, a major part of our work that directs the analysis of opinions in the discourse level. I. INTRODUCTION W ITH the increasing interest of the general public towards socio-political happenings, it is a growing practice these days to read and analyze the different opinions on a particular event published by the media in the form of editorials. Such an analysis would not only help to understand how a particular event has been perceived by different media sources but also provide a relatively true view of the happenings and hence is of primary interest to journalists, public figures and political analysts. The online electronic resource http://www.nepalmonitor.com for instance, includes the editorials from different national and international newspapers organized on a monthly basis. These editorials basically talk on some of the prime events that have taken place in Nepal in a particular month. The editorial sources in the link provided above range from Voice of America, The Japan Times, The Washington Times, The New Nation – Bangladesh, Dawn, Gulf News – UAE, The Himalayan Times, The Kathmandu Post, The Hindu – India, Times of India , The Indian Express and Economic Times – India. It is indeed interesting to see how these editorials differ in opinions, how convincing or persuasive the arguments appear in providing supports to certain conclusion(s) and if possible judge the different degrees of biases and prejudices evident in them. These problems are quite difficult even to humans, let alone the machine. From an automation perspective, it would have been a good thing if there were a provision for constructing a synthesis of the different opinionated arguments (Positive, Negative and Neutral) in one document with some useful information like (source, date, orientation etc. of the editorial) clearly mentioned so that the readers need not II. OPINION MINING AS A PROBLEM Although Opinion Mining has emerged only quite recently as a subdiscipline under computational linguistics, a considerable amount of work has already been done in this direction. These works range from a variety of task domains like mining the product reviews available on the web, sentiment classification of documents, opinion mining and summarization to much more. Irrespective of the nature of different specific tasks, Opinion Mining generally encompasses the following generic problems: 1. Determining the subjectivity or identifying the subjective and objective expressions in texts [1, 2, 7]. 2. Determining the orientation or polarity of the subjective expressions [3, 4, 5, 6, 11]. 3. Determining the strength of the orientation of the subjective expressions [8]. This involves deciding Manuscript received August 8,2009 B. Krishna Bal is with the Madan Puraskar Pustakalaya, Lalitpur, Patan Dhoka, Nepal (phone: 977-1-5521393; fax: 977-1-5536390; e-mail: bal@ mpp.org.np). P. Saint Dizier is with IRIT, 118 Narbonne 31062 Toulouse, France. (e-mail: [email protected]). 978-1-4244-4139-6/09/$25.00 ©2009 IEEE 227 whether the Positive or Negative opinion expressed in texts is Weakly Positive/Negative, Mildly Positive/Negative or Strongly Positive/Negative. are characteristic for the presence of certain verbs like “declare” and different tense and number forms of the verb “be” etc. Moreover, statements interpreted as facts are generally accompanied by some reliable authority providing the evidence of the claim. Below in Table 1, two examples of factual sentences and their respective sources of evidence are presented. Our problem of analyzing editorials essentially involves all of the problems 1-3 above. Additionally, it also requires opinion analysis in the higher levels, i.e. the discourse level. We will be discussing about this in later sections. Table 1. Facts and reliable authority III. RELATED WORKS Although our work belongs to the general class of Opinion Mining, it indeed also encompasses the analysis of the Argumentation Structure in editorial texts, which follows from the Argumentation Theory. From the Opinion Mining perspective, our work is mostly close to that of [9,10,11] While [9] employ major topic detection and the concept of relevant sentences for opinion analysis and summarization, [10] additionally talk about opinion tracking using sentiment scores. Similarly, [11] talk about finding out primarily Opinion Holders and the topic expressed in online News Media Text. Clearly, all of these only partially address our problem. Talking about the Argumentation Theory, there has been work in the AI and Law community which have looked at the annotation of legal texts using argumentation schemes. [12,13], for instance, which make use of argumentation schemes, which are a popular way of categorizing certain patterns of arguments appropriate to contexts. Similarly [14] looks at strength of arguments based on considerations of the perspectives held by the audience to whom the argument is addressed. [15] deals with the analysis of the structure of argumentative discourse producing a model for the same. Hence, what follows from the above is that, although works that partially address our problem exist in more than one research domains, practically to the best of our knowledge, no works are known today, which embrace the two fields – Opinion Mining and Argumentation Theory altogether for the analysis and construction of a synthesis of opinion arguments from one or more editorials over a common topic. Our work does not. Our work takes its basics for opinion identification and extraction as well as strength determination in the word and phrase level from Opinion Mining, whereas the opinion analysis in the discourse level is conducted on the basis of the Argumentation Theory thus analyzing the argumentation structure found in editorials. Facts Reliable authority Both the two dates announced for the constituent assembly (CA) elections came and went without the vote taking place. We have fewer people getting killed every day. Election Commission for CA elections 2007. Nepal Police Department of Crime and Investigation. (December 2007) Opinions, on the other hand, are characterized by the evaluative expressions of various sorts such as the following [17]: a) Presence of evaluative adverbs and adjectives in sentences – “ugly” and “disgusting”. b) Expressions denoting doubt and probability – “may be”, “possibly”, “probably”, “perhaps”, “may”, “could” etc. c) Presence of epistemic expressions – “I think”, “I believe”, “I feel”, “In my opinion” etc. It is obvious that the distinction between the two is not always straightforward. Facts could well be opinions in disguise and, in such cases, the intention of the author as well as the reliability of information needs to be verified. In order to make a finer distinction between facts and opinions and within opinions themselves, opinions are proposed for gradation as shown below in Table 2. Table 2. Gradation of opinions Opinion type Hypothesis statements Theory statements Global definition Explains an observation. Widely believed explanation Assumptive statements Improvable predictions. Value statements Claims based on personal beliefs. Exaggerated statements Intended to sway readers. Attitude statements Based on implied belief system. Source:[www.clc.uc.edu/documents_cms/TLC/Fact_and_Op inion.ppt] IV. LINGUISTIC BASIS FOR DISTINGUISHING FACTS AND OPINIONS Since editorials are usually a mix of facts and opinions, there is clearly a need to make a distinction between them. Opinions often express an attitude towards something. This can be a judgment, a view or a conclusion or even an opinion about opinion(s). Different approaches have been suggested to distinguish facts from opinions. Generally, facts For the purpose of developing a linguistic base in order to identify opinions (opinion words or phrases) in texts, we maintain a Polarity lexicon with opinion words and expressions collected from the corpus categorized into 228 prototypically positive and negative sets. Next, by consulting Downtoners Kind of: sort of, kinda, rather, to some extent, almost, all but the available electronic resources like the dictionary, Mildly: gently thesaurus and even the WordNet, we manually increase the size of the lexicon by introducing synonyms to the already Source:[www.grammar.ccc.commnet.edu/grammar/adverbs.h compiled entries from the corpus. This gives the opportunity tm] of compiling a rich collection of opinions – both context We include an example for each of the above categories dependent (phrases from the corpus) and context independent (words from the dictionary and other of the intensifiers and their role in changing the strength of resources). Moreover, as part of the lexicon building, we opinions. group semantically similar members within the bigger sets Bad – Low, Really bad – High into smaller subsets. Below in Table 3, we provide a sample Quiet – Low, Absolutely quiet – High of the polarity lexicon. Friendly – Average, Sort of friendly - Low Table 3. Polarity lexicon Similarly, in Table 6, we present a sample of the premodifiers and show their contribution to the overall Positive Negative strengths of the expressions. PeaceInfamy – {peace(n),peaceful(adj), {infamy(n),discredit(n), Table 5. Pre-modifier lexicon accord(n),pact(n),treaty(n), disrepute(n),notoriety(n), pacification(n),pacify(v), infamous(n),dishonor(n), Adverb/Adjective PreStrength peacefulness(n),serenity(n)} notorious(adj)} Strength modifier HappyHeight of impunity, drama of Fast (Low) Very Very fast (High) {happy(adj),happiness(n), consensus. Careful(Low) Lot Lot more careful (Average) felicitous(adj),glad(adj), more willing(adj),happiness(n), Better (Average) Much Much better (High) felicity(n)} Serious (Low) Much much better (High) Much more serious (High) V. STRENGTH OF OPINIONS Good (Low) SomeSomewhat good (Average) what Besides detecting the polarity of opinions as Positive, Quite Quite good (Average) Negative or Neutral, it is equally important to determine the strength of the opinions (Weak, Strong, Mildly Weak, Source:[www.grammar.ccc.commnet.edu/grammar/adjective Mildly Strong etc.) present in text. For this purpose, we have s.htm#a-_adjectives] developed the Intensifier and Pre-modifier lexicons, which basically consist of adverbs and pre-modifiers. The latter We are also currently working on the report and modal come in front of adverbs and adjectives. Both the intensifiers verbs and their respective roles in determining the strengths and pre-modifiers play a role in conveying a greater and/or of opinions. Their precise contribution for this purpose is lesser emphasis to something. Intensifiers are reported to still subject to further study. have three different functions – emphasis, amplification and downtoning. In Table 4, we present a sample of the intensifiers. VI. OUTLINING THE SUPPORT AND RHETORICAL RELATIONS IN EDITORIALS Table 4. Intensifier lexicon Type Emphasizer Amplifiers From the opinion mining and analysis in the word, phrase and sentence level, we now move to higher levels of analysis, i.e. the discourse level, style of writing, political affiliations of the editorials and so on. And then here exactly comes the necessity for outlining the argumentation structure of editorials. Editorials consist of an argumentation structure consisting of the conclusion statement, which is in turn supported by other statements (also known as the supports) for or against the conclusion. These supports as well as the conclusion can be either facts or opinions. The supports may be further developed by means of text fragments, also widely known as rhetorical relations. We have been working towards analyzing the argumentation structure of editorials thus determining the Value Really: truly, genuinely, actually. Simply: merely, just, only, plainly. Literally For sure: surely, certainly, sure, for certain, sure enough, undoubtedly. Of course: naturally. Completely: all, altogether, entirely, totally, whole, wholly. Absolutely: totally and definitely, without question, perfectly, utterly. Heartily: cordially, warmly, with gusto and without reservation. 229 study and analysis of the raw corpus from the perspective of opinion analysis and argumentation outlining. It combines the aspects of both Opinion Mining and the Argumentation Theory, thus clearly conforming to our needs. persuasiveness inherent in texts. The result is a discourse analysis of opinions producing some sort of semantic representation. Ultimately, the analyzed argumentation structure would be used to construct a synthesis of positive and negative arguments from one or several editorials (single or multiple sources) over a common date or a span of time. Table 6: Semantic tagset Parameters Argument_type In our semantic and pragmatic representation of editorials, the root node is a conclusion. The conclusion consists of the attributes – polarity (Positive, Negative or Neutral), date, source. Next, the root node is associated with one or more supports. Similarly, the support relations have the attributes – date, source, orientation of support (for or against), reporting level (characterized by report verbs and modal verbs expressing different levels of commitments), conditional level (Yes or No indicating the support’s association with some other supports), and strength of the argument (in terms of direct, relative and persuasion effect). Expression_type Fact_authority Opinion_orientation Orientation_support Id Date Source Commitment Conditional Direct-strength Relative-strength Persuasion-effect Rhetoric_relation type Next, for our purpose of editorial analysis, we have used the following rhetorical relations (Marcu, 1997): • Exemplification: illustrates a support, while giving it a higher strength and persuasion effect. • Contrast: relates two supports A and B, where A and B are both true while partly contradicting each other. They are in general linked by connectors such as nevertheless, although, but, even, if etc. • Discourse frame: introduces a factual statement which indicates the environment and scope of the conclusion (time, facts etc.) • Justification: where B gives reasons and explains A. This relation is stronger than the explanation relation. • Elaboration: where B is an elaboration of A if it develops or describes a part of A. • Paraphrase: which is just another way of stating the support or conclusion, adding strength to the statement. • Cause-effect: establishes a causal relationship between supports. • Result: where B results at least partly or indirectly from A. • Explanation: where B is an explanation for A if it indicates the reasons for A in a quite neutral way. • Reinforcement: where B reinforces A. It is stronger than an elaboration, an exemplification or an explanation. In general, it contains specific marks related to confirmation, enforcement etc. Possible values Support, Conclusion, Rhetorical_relation Fact, Opinion, Undefined Yes, No Positive, Negative, Neutral For,Against Id number of the support Date of publication of the editorial Source or name of the newspaper Modal, Low, High Yes, No Low, Average, High Low, Average, High Low, Average, High Exemplification, Contrast, Discourse, Frame, Justification, Elaboration, Paraphrase, Causeeffect, Result, Explanation, Reinforcement Conclusion:(<Date:2007-12-28>,<Source:KTMPOST>, <Orientation: Positive>, <Strength: High>) [CA election] will take place in 2008. Support:(<ID:1>,<Date:2007-12-28>,<Source: KTMPOST>, <Orientation: Positive, Support Type: For>, <Strength: Low>) The Post believes that the long awaited and ever elusive [CA elections] will take place this year. Rhetorical_relation: Justification(1,2) //Support 2 is a Justification of Support 1 Support: (ID:2>, <Date:2007-12-28> ,<Source:KTMPOST>,<Orientation:Positive, Support Type: Conditional, For>, <Strength:Low>) If we behave responsibly, we will be able to hold the [CA elections]. Rhetorical_relation: Justification(1,3) // Support 3 is a Justification relation of Support 1. Below, we present an example of the argumentation structure from our corpus of editorials. We have partly used the semantic tagset that we have developed (see Table 6) in defining the argumentation structure. The semantic tagset, which is now more or less stabilized, is a result of a careful Support:(<ID:3>, <Date:2007-12-28> ,<Source:KTMPOST>,<Orientation:Positive, Support Type: Conditional, For>, <Strength:Low>) 230 If the Maoists do not run away from elections, if the recently formed and old parties of the terai live up to the promises to allow [elections] happen… In the example above, the conclusion is characterized by a vector that contains id, date, source, orientation and strength. The conclusion is followed by supports and rhetorical relations. Supports are also described in the same manner as the conclusion. The referential expression is put inside square brackets which bind the supports to the event reported in the conclusion. Hence [CA elections] and [elections] are the referential expressions in the example above. Similarly, the underlined text portions above are the opinion anchors, i.e. those terms that a priori mark the statement as an opinion. For the strength, we are currently only considering the attribute, direct-strength. Other two attributes would be gradually incorporated. Rhetoric_relation type. It was found that the disagreement level was most frequent for the tag Expression_type (one in every five tagged words) followed by Opinion_orientation (one in every ten tagged words), Orientation_support (one in every fifteen tagged words) and so on. The disagreements were resolved by mutual discussions as well as consultations with linguist experts. Below in Fig.1, we present a diagrammatic representation of the argumentation structure for an editorial using the Athena software available at http://www.athenasoft.org Rhetorical relations in the above case further develop the supports and are characterized by links existing between supports. VII. TEXT COLLECTION AND ANNOTATION With an aim to develop a training data for tagging and then after analyzing the editorial texts, editorials have been collected from at least three different sources. The collected texts serve as a corpus for our research work. The editorials represent a common theme – Socio-political and subtheme Peace and stability and are taken from different dates towards the end of the year 2007 and the beginning of 2008 amounting to a total of 300 plus text files, with a total of 6000 sentences and an average of 20 sentences per editorial. The texts are taken respectively from The Kathmandu Post Daily, http://ekantipur.com/ktmpost.php, The Nepali Times Weekly, http://nepalitimes.com.np and The Spotlight Weekly, http://nepalnews.com/spotlight.php. We plan to extend the collection by including editorials from both national and international newspapers covering a wide range of domains like society, culture, health, education etc. The collected texts have been annotated by two annotators having a fairly good understanding of the English language. The annotators have been assigned the same texts to see how semantic annotations can differ among annotators. Results have shown that the difficulties in the manual annotation exist at two levels, the first one in determining the orientation of polarity of words or expressions and the second one in evaluating their strengths for three different strength attributes – direct-strength, relative-strength and persuasionstrength. Wherever, the annotators have confusions about providing one particular value, they have been instructed to provide multiple values separated by commas. The annotations made by the annotators were then exchanged with each other for peer review basically to determine the inter-annotator disagreement rates. The disagreements were basically noted for the tagging the text thus picking the values for the attributes – Expression_type, Opinion_orientation, Orientation_support, Commitment, Direct-strength, Relative-strength, Persuasion-effect and 231 Fig.1. Diagrammatic argumentation structure representation of the In the diagrammatic representation above, the topmost node is the conclusion followed by child nodes below. The ones highlighted partially in green represent positive supports whereas the ones in red are counterarguments or negative supports to the conclusion. The text in the yellow box represents a detailed information on each node, in our case the different attribute-value pairs for the attributes (date, source, orientation of support, strength etc.), which can be entered while developing the diagram and that can be read by moving the mouse cursor to the respective node. ACKNOWLEDGMENT We would like to thank Prof. Patrick Hall for his continuous support and inspiration for this work. This work was partly supported by the French Stic-Asia program. Thanks are also due to Madan Puraskar Pustakalaya, Nepal for the support to this work. REFERENCES [1] J. M. Wiebe, R. F. Bruce, and T. P. O'Hara, "Development and use of a gold-standard data set for subjectivity classifications ," in Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics , College Park, Maryland , 1999. [2] V. Hatzivassiloglou and J. M. Wiebe, "Effects of adjective orientation and gradability on sentence subjectivity.," in Proceedings of the 18th Conference on Computational Linguistics - Volume 1 , Saarbrücken, Germany, 2000, pp. 299-305. [3] V. Hatzivassiloglou and K. R. McKeown, "Predicting the Semantic Orientation of Adjectives," in In Proceedings of the 35'th Annual Meeting of the ACL and the 8'th Conference of the European Chapter of the ACL, pages 174-181, Madrid,Spain, 1997. [4] P. D. Turney, "Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews," in Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, 2002. [5] A. Esuli and F. Sebastiani, "Determining the semantic orientation of terms through gloss classification," in Proceedings of the 14'th ACM international conference on information and knowledge Management, Bremen, Germany, 2005. [6] B. Pang, L. Lee, and S. Vaithyanathan, "Thumbs up?: sentiment classification using machine learning techniques ," in Proceedings of the ACL-02 conference on Empirical methods in natural language processing Volume 10 , 2002. [7] E. Riloff, J. Wiebe, and T. Wilson, "Learning subjective nouns using extraction pattern bootstrapping," in Proceedings of the Seventh Conference on Natural Language Learning (CoNLL-03), 2003. [8] T. Wilson, J. wiebe, and P. Hoffman, "Recognizing contextual polarity in phrase-level sentiment analysis.," in Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, Vancouver, British Columbia, Canada, 2005. [9] L.-W. Ku, L.-Y. Lee, and T.-H. Wu, "Major topic detection and its application to opinion summarization. ," in SIGIR 2005, 2005, pp. 627-628. [10] L. Ku, Y. Liang, and H. Chen, "Opinion extraction, summarization and tracking in news and blog corpora," in Proceedings of AAAI-2006 Spring Symposium on Computational Approaches to Analyzing Weblogs, 2006. [11] S.-M. Kim and E. Hovy, "Extracting Opinions, Opinion Holders, and Topics Expressed in Online News Media Text," in Proceedings of ACL/COLING Workshop on Sentiment and Subjectivity in Text, Sidney, AUS, 2006. [12] M.-F. Moens, E. Boiy, R. M. Palau, and C. Reed, "Automatic detection of arguments in legal texts ," in Proceedings of the 11th international conference on Artificial intelligence and law , Stanford, California , 2007 , pp. 225-230. [13] D.N.Walton, Argumentation Schemes for Presumptive Reasoning. Mahwah,NJ: Lawrence Erlbaum Associates, 1996. [14] T.J.M.Bench-Capon, "Agreeing to Differ:Modelling Persuasive Dialogue Between Parties Without a Consensus About Values," Informal Logic, vol. 22, no. 3, pp. 231-45, 2002. [15] R. Cohen, "Analyzing the structure of argumentative 232 discourse," Comput. Linguist, vol. 13, no. 0891-2017, pp. 11-24, 1987. [16] D. Marcu, "The rhetorical parsing of natural language texts ," in Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics , Madrid, Spain , 1997 , pp. 96-103. [17] K. Dunworth. (2008) UniEnglish reading: distinguishing facts from opinions. http://unienglish.curtin.edu.au/local/docs/ RW_facts_opinions.pdf
© Copyright 2026 Paperzz