Towards Building Annotated Resources for Analyzing Opinions and Argumentation in News Editorials Bal Krishna Bal, Patrick Saint Dizier Information and Language Processing Research Lab Department of Computer Science and Engineering Kathmandu University P.O. Box - 6250 Dhulikhel, Nepal IRIT-CNRS, 118 route de Narbonne 31062, Toulouse, France E-mail: [email protected], [email protected] Abstract This paper describes an annotation scheme for argumentation in opinionated texts such as newspaper editorials, developed from a corpus of approximately 500 English texts from Nepali and international newspaper sources. We present the results of analysis and evaluation of the corpus annotation – currently, the inter-annotator agreement kappa value being 0.80 which indicates substantial agreement between the annotators. We also discuss some of linguistic resources (key factors for distinguishing facts from opinions, opinion lexicon, intensifier lexicon, pre-modifier lexicon, modal verb lexicon, reporting verb lexicon, general opinion patterns from the corpus etc.) developed as a result of our corpus analysis, which can be used to identify an opinion or a controversial issue, arguments supporting an opinion, orientation of the supporting arguments and their strength (intrinsic, relative and in terms of persuasion). These resources form the backbone of our work especially for performing the opinion analysis in the lower levels, i.e., in the lexical and sentence levels. Finally, we shed light on the perspectives of the given work clearly outlining the challenges. 1. Introduction There have been growing efforts in developing annotated resources so that they can be useful in acquiring annotated patterns e.g. via statistical or machine learning approaches and ultimately aid in the automatic identification, extraction and analysis of opinions, emotions and sentiments in texts. Some of such works on text annotation, among many others, include Wilson (2003), Stoyanov and Cardie (2004), Wiebe et al. (2005), Wilson (2005), Read et al. (2007) etc. These works are primarily focused on annotating opinions or appraisal units (attitudes, engagement and graduation) in texts. This follows from the definition of annotation schemes sharing similar notions within the Appraisal Framework developed by Martin and White (2005). Other works on annotating texts include Carlson et al. (2001),Maite Taboada and Jan Renkema1 (2008) etc., which deal with text annotation in the discourse level employing discourse connectives and rhetorical relations. Much interest towards the analysis of opinionated texts like news editorials are also coming up in the recent years, for example, in Elisabeth Le’s works1. However, despite these efforts, the development of a suitable annotation scheme for corpus annotation from the perspective of opinion and argumentation analysis in news editorials seems to be clearly missing. While the existing annotation schemes and guidelines may be sufficient for annotating appraisal units, discourse connectives, discourse units and even possibly some 1 rhetorical relations, we argue that for analyzing the various forms of argumentation, it is necessary to determine the type of supports with respect to an argument (either “For” or “Against”) and the persuasion levels and effects in opinionated texts. This then requires us to make some additional provisions in the annotation scheme for addressing these issues which include: - the introduction of some metadata of the source text like date and source of publication useful for source attribution during opinion and argumentation analysis - the introduction of the date of the opinion or event in case of a temporal analysis (opinion evolution), - the parameters for identifying arguments and for determining the orientation of their supports, - the evaluation of the strength determining attributes like the levels of commitment (expressed via the use of different report and modal verbs) - and other forms of expressions indicating direct, relative and persuasion strengths (mostly involving words or phrases consisting of a combination of one or more adjectives, adverbs, intensifiers and pre-modifiers or even in isolation). Annotating argumentation in opinionated texts and particularly news editorials is certainly not an easy task. This is because the underlying process can be very involved depending upon the structure of the editorials. Our experience has shown that this is quite a complicated process as the argumentation structure in such texts does not necessarily resemble the standard forms of rational thinking or reasoning. http://www.sfu.ca/rst/06tools/discourse_relations_corpus.html 1152 Similarly, the editorial argumentation structure does not necessarily always fit into one of the argumentation schemes as discussed by Toulmin (2008): “Data→Warrant→Conclusion” or “Data→Warrant→Unless Rebuttal→Conclusion”. To make things even more complicated, the possible association of the strength of an argument with the standard notion of validity does not exactly fit in case of persuasive texts and hence editorials. There are other parameters like specific exceptions and exclusions that would need to be taken into consideration as argued by Kolflaath (2007). In persuasive texts, in order to perform an argumentation analysis, we would need to take into consideration several correlated aspects like text, author, purpose, audience, context as revealed by Silva Rhetoricae (www.rhetoricae.byu.edu). This encompasses rhetorical analysis, an important domain of engagement in text analysis. Finally, editorials are abound with irony, over-emphasis, provocations, etc. which tend to prevent the reader from making an objective analysis. 2. Building a Corpus of Editorials and Making Preparations for Annotation We have collected the editorials from three different sources being published from Nepal and others from a number of sources in India and English journals around the world. The editorials we have selected are related to a common theme – “Socio-political” and are taken from different dates towards the end of 2007 and till the mid of 2009 amounting a total of around 500 text files, with a total of 8000 sentences and an average of 20 sentences per editorial. The texts covering events of Nepal are taken respectively from journals with varying political affiliations: The Kathmandu Post Daily 2 , The Nepali Times Weekly3 and The Spotlight Weekly4. We also have added to our collection editorials from the online electronic resources 5 , that contain editorials from different national and international newspapers published on a monthly basis. These editorials basically write about some of the prime events that have taken place in Nepal or around the world in a particular month. Considering non-Nepali journals is of much significance since their styles are quite different and their analysis is less committed to a certain local political party or orientation. While making preparations for annotating the editorials, it should be noted that often not all portions of the editorials are useful for argumentation or opinion analysis. An important point in identifying argumentation in texts is that the whole argumentation unit is in general composed 2 http://ekantipur.com/ktmpost.php http://nepalitimes.com.np 4 http://nepalnews.com/spotlight.php 5 http://nepalmonitor.com 3 of at least two parts – a claim part and a justification part (respectively a conclusion and its support). The claim part puts forward an opinion or statement on an issue and the justification part provides support to the statement being put forward. In fact, how persuasive an argument is and what persuasion effect it has, largely depends on how effectively the justification part is employed and its contextual environment. Similarly, rhetorical relations alter the strengths of these supports and correspondingly arguments. Non-argumentative units from the text which do not play a role in providing direct or indirect supports to an argument (in this case the claim), may be ignored while annotating. For illustration purpose, we present an excerpt of an editorial text and underline its argumentative units in Table 1: Opening statement/claim/conclusion: 2007 was a violent year full of conflicts and confusions. Text: Nepali leftists have always had a flair for pompous rhetoric. Pushpa Kamal Dahal and BabuRam Bhattarai insist on using a paragraph to say what they can in one sentence. So we have a 23-point agreement among the seven parties in which the communists commit themselves, once again, to constituent assembly elections. Nepal has been declared a republic, but it will only take formal effect sometimes in the middle of next year after it is ratified by the constituent assembly. But the king is in his palace, still paid a salary by tax-payers money. Table 1: Argumentative and non-argumentative fragments in editorial text In the text above, the contents is analyzed with respect to the opening statement or claim, which is also sometimes referred to as the concluding statement. The result of the analysis shows that the text portion in italics belongs to the non-argumentative unit as it does not convey any direct or indirect association with the opening statement or claim. On the other hand, the remaining blocks of texts with certain underlined fragments belong to the argumentative part as they convey, possibly via inferences, supports for the opening statement. The decision of labeling the text fragments belonging to argumentative or non-argumentative units in the above text is purely manual and based on human analysis. It is obvious that automation efforts would require developing sophisticated linguistic methods and analysis. As part of the corpus analysis, we are also studying argumentation schemes in news editorials by conducting a careful analysis of the general argumentation structure found in the corpus. While we would be also looking at more general argumentation schemes (deductive, inductive and presumptive), our major focus would be on identifying the structure and occurrences of defeasible schemes in news editorials. This information would be vital in analyzing the change in opinions over time. We will attempt to apply the identified argumentation 1153 schemes for classifying the different argumentation instances from the corpus. 3. An Overview of the Annotation Scheme As a result of the analysis of our corpus of editorials, we have developed a semantic tag set specifically designed for the annotation of the editorials; a sample is shown below in Table 2. The values associated with the tags are subject to change. Parameters Argument_type Expression_type Fact_authority Opinion_orientation Orientation_support ID Date Source Commitment Conditional Direct-strength Relative-strength Persuasion-effect Rhetoric_relation type Possible values Support, Conclusion, Rhetorical_relation Fact, Opinion, Undefined Yes, No Positive, Negative, Neutral For, Against Id number of the support Date of publication of the Editorial Source or name of the newspaper Modal, Low, High Yes, No Low, Average, High Low, Average, High Low, Average, High Exemplification, Contrast, Discourse Frame, Justification, Elaboration, Paraphrase, Cause-effect, Result, Explanation, Reinforcement Table 2: Semantic tagset Below, we briefly explain the designated purpose of each of the tags: - Fact_authority: level of authority of the author, - Orientation_support: for or against the controversial issue being studied, - Conditional: contains a conditional expression that limits the scope of the utterance, - Direct_strength: the intrinsic strength of the utterance measures from the terms it contains, - Relative_strength: measures the strength of the utterance in comparison with the other utterances of the same document: the idea is to take into account the author’s style. - Persuasion_effect captures the persuasive force of the utterance, this is quite different from the strength. - Rhetoric_relation is a specific tag introduced to identify utterances which have a rhetorical relation with others, they are not necessarily arguments or opinions. The semantic tag set above has been tested for coverage in annotating some 56 editorials from our corpus of varying topics and structures. The use of the tag set for annotation on the corpus has been experimented manually as well using automatic diagramming tools like Araucaria (http://araucaria.computing.dundee.ac.uk/doku.php) and Athena (http://www.athenasoft.org/) . While we found the tools quite robust in terms of producing argumentation diagrams and correspondingly developing argumentation mark-up language - AML text (applicable in case of Araucaria), they clearly lack the provision for introducing rhetorical relations in the whole argumentation outlining setup. The assessment of the strength of an argument is provisioned in the Athena software on a [0-100]% scale. However, specialization of the strengths of arguments on the basis of varied parameters (direct-strength, relative-strength and persuasion effect) as in our case, is not available even with the Athena software. We present at the end of this document examples of this annotation in diagrammatic forms (Figures 1 and 2). In Figure 1, the labels denoted by broken and dotted arrow forms and coming out of the conclusion and support nodes are rhetorical relations that further develop (paraphrase, explain, exemplify, produce results etc.) supports or conclusions. Such a development with the help of rhetorical relations adds persuasion effects or strength to the arguments (supports and conclusion). In the diagram shown in Figure 2, the tree structure has the root node as the conclusion or the main thesis of the argumentation. The child nodes below the root node are the positive and the negative supports to the given conclusion. The positive supports, characterized by “For the Conclusion” are denoted by green color nodes while the negative supports, characterized by “Against the Conclusion” are represented in red. The yellow part represents the detailed information on each node (the conclusion and the supports) elaborated by the attributes – date, source, orientation, strength etc. The figures in percentages reflect the subjective weights or estimates placed by any human annotator or analyzer with respect to how convincing and to what degree an argument is in terms of providing either a positive or a negative support. 4. Linguistic Basis for Distinguishing Facts and Opinions Since editorials are usually a mix of facts and opinions, there is apparently a need to make a distinction between them. Opinions often express an attitude towards something. This can be a judgment, a view or a conclusion or even an opinion about opinion(s). Different approaches have been suggested to distinguish facts from opinions. Generally, facts are characterized by the presence of certain verbs like “declare” and specific tenses or a number forms of the verb to be. Moreover, statements interpreted as facts are generally accompanied by some reliable authority providing the evidence of the claim. Opinions, on the other hand, are characterized by the evaluative expressions of various sorts such as the following - Dunworth (2008): a) Presence of evaluative adverbs and adjectives in sentences – “ugly” and “disgusting”. b) Expressions denoting doubt and probability – “may be”, “possibly”, “probably”, “perhaps”, “may”, “could” etc. c) Presence of epistemic expressions – “I think”, “I believe”, “I feel”, “In my opinion” etc. 1154 It is obvious that the distinction between facts and opinions is not straightforward. Facts could well be opinions in disguise and, in such cases, the intention of the author as well as the reliability of the information needs to be verified. In order to make a finer distinction between facts and opinions and within opinions themselves, opinions are proposed for gradation as shown below in Table 3: Opinion type Hypothesis statements Theory statements Assumptive statements Value statements Exaggerated statements Attitude statements Global definition Explains an observation. Widely believed explanation Improvable predictions. Claims based on personal beliefs. Intended to sway readers. Weak, and Mildly Strong, etc.). The widely used approach is making use of comparative relations, i.e. adjective degrees (positive – low, comparative - average, superlative - high), but this approach is really limited and can only be used on large collections of documents. We additionally suggest considering intensifiers, pre-modifiers, report verbs and modal verbs in this regard. We have developed the Intensifier and Pre-modifier lexicons, which basically consist of adverbs and pre-modifiers. The latter come in front of adverbs and adjectives. Both the intensifiers and pre-modifiers play a role in conveying a greater and/or lesser emphasis to something. Intensifiers are reported to have three different functions – emphasis, amplification and downtoning. We give below a sample of the intensifier lexicon in Table 5. Type Value Emphasizer Really: truly, genuinely, actually. Simply: merely, just, only, plainly. Literally For sure: surely, certainly, sure, for certain, sure enough, undoubtedly. Of course: naturally. Completely: all, altogether, entirely, totally, whole, wholly. Absolutely: totally and definitely, without question, perfectly, utterly. Heartily: cordially, warmly, with gusto and without reservation. Kind of: sort of, kind a, rather, to some extent, almost, all but Mildly: gently Based on implied belief system. Table 3: Gradation of Opinions Source:[www.clc.uc.edu/documents_cms/TLC/ Fact_and_Opinion.ppt] 5. Amplifiers Maintaining an Opinion Lexicon For the purpose of developing a linguistic base in order to identify opinions (opinion words or phrases) in texts, we developed a dedicated Sentiment/Polarity lexicon with opinion words and expressions collected from the corpus categorized into prototypically positive and negative sets. Next, by consulting the available electronic resources like dictionaries, thesaurus and WordNet, we manually increased the size of the lexicon by introducing semantically related terms to the already compiled entries from the corpus. This gives the opportunity of compiling a rich collection of opinions – both context dependent (phrases from the corpus) and context independent (words from dictionaries and other sources). Moreover, as part of the lexicon building, we group semantically similar members within the bigger sets into smaller subsets. Below, we provide a sample of the sentiment/polarity lexicon in Table 4: Positive Peace-{peace(n),peaceful (adj), accord(n),pact(n),treaty(n ), pacification(n),pacify(v), peacefulness(n),serenity( n)} Happy-{happy(adj),happi ness(n), felicitous(adj),glad(adj), willing(adj),happiness(n), felicity(n)} Negative Infamy–{infamy(n),discre dit(n),disrepute(n),notoriet y(n), infamous(n),dishonor(n), notorious(adj)} Downtoners Table 5: Intensifier lexicon Source:[www.grammar.ccc.commnet.edu/grammar/ adverbs.htm] We include an example for each of the above categories of the intensifiers and their role in changing the strength of opinions, as in: Bad – Low, vs. Really bad – High Quiet – Low, vs. Absolutely quiet – High Friendly – Average, vs. Sort of friendly - Low Similarly, we present below a sample of the pre-modifiers and show their contribution to the overall strengths of the expressions in Table 6: Height of impunity, drama of consensus. Table 4: Sentiment/Polarity lexicon Besides detecting the polarity of opinions as Positive, Negative or Neutral, it is equally important to determine the strength of the opinions (e.g.: Weak, Strong, Mildly 1155 Adverb/Adjecti ve Strength Fast (Low) Careful(Low) Pre-modifier Strength Very Lot more Very fast (High) Lot more careful (Average) Better (Average) Much Much better (High) Serious (Low) Much much better (High) Much more serious (High) Good (Low) Somewhat Somewhat good (Average) Quite Quite good (Average) Table 6: Pre-modifier lexicon Source:[www.grammar.ccc.commnet.edu/grammar/ adjectives.htm#a-_adjectives] Modal verbs and Reporting verbs also have significant roles in determining the intent or commitment level expressed in verbal forms.We present below a sample of the Modal verb lexicon and its role in strength determination in Table 7: Value Ability/Possibility Type Permission Permission Advice/Recommenda tion/Suggestion Necessity/Obligation Verb Can Could May Might Should Strength effects Average Low Average Low Average Must High The evaluation of the various forms of strengths, based on these factors is under investigations, via a metrics that we are testing. 6. Discovering the General Opinion Patterns in the Corpus of Editorials Since our ultimate aim would be to achieve an accurate analysis of at least domain dependent editorials, it would be very necessary that we define the general opinion patterns prevalent in editorials of the theme under investigation, “Socio-Political”. Below in Table 9, we report a sample of such Opinion Patterns from the corpus. This work is still under development, together with the study of its extension to other domains. Pattern type Expression + Negative nouns If Negative event, then, Negative effect/result Anti + (Noun/Adjecti ve) Negation+ negative verb Negative adjective + noun 7. Finally, we present below a sample of the Reporting verb lexicon and its role in strength determination in Table 8. # {invite, admit, agree, congratulate} ## {doubt, fear, complain, recommend, suggest, claim} ### {believe, note, say, tell, mention, suppose, guess, remark, wonder} Example Height of anarchy, impunity, lawlessness Negative If the stalemate continues, state governance will come to a complete halt. Anti-national, Anti-social Negative Positive No objection Negative Unreasonable move Table 9: General Opinion Patterns from the corpus Table 7: Modal verb lexicon Value Orientation Negative Commitment/ Reporting/ Intent Level High Strength effects Average Average Low Low High Table 8: Reporting verb lexicon Note that the # set of words has a higher commitment or intent level and hence has high strength effects. The ## set of words, similarly, has an average commitment or intent level and hence has average or medium strength effects. Finally, the ### set of words has a low commitment or intent level and hence has low strength effects. General scenario of the system Our system is currently under linguistic analysis and testing. An implementation should come shortly. We plan to use the <TextCoop 6 > platform developed at IRIT, whose aim is to extract discourse fragments based on grammar structures or patterns. The general scenario offered by the system is as follows: - Produce a controversial statement for which opinions for or against are expected, - Extract from news editorials a set of related editorials. This step is very difficult and under investigation: identifying relatedness between a statement and a text requires textual entailment in most cases. - Identify zones in each selected editorial which show a strong relatedness with the controversial issue. - Identify arguments in these zones, their orientation and strength from the linguistic criteria given in this paper. - Construct a synthesis, and investigate its different forms (temporal, strength, etc.). Obviously this work is quite vast and hence each step should be resolved gradually. Domain specification is an additional issue that needs attention. In our case, we basically focus on political and general social issues. 6 http://www.irit.fr/recherches/SAMOVA/pagetextcoop.html 1156 8. Evaluation of the Annotations The collected texts have been annotated in the XML format using the semantic tag set described above by two annotators having a good knowledge and understanding of the English language. The inter-annotator agreement kappa value was approximately 0.80, which means there was substantial agreement between the annotators. The disagreements were basically noted for three attributes – Expression_Type (with values Fact, Opinion, Undefined), Opinion_Orientation (with values Positive, Negative and Neutral) and Orientation_Support (with values For or Against). The annotated corpus was peer reviewed for inconsistency checks. The disagreements were resolved by mutual discussions as well as consultation with experts. We provide the current statistics of the corpus annotation work below in Table 10: Attributes Major theme Number of themes Number of editorials Time period Editorial sources No. of opening statements (conclusions/claims) No of positive arguments with respect to the claim No. of negative arguments with respective to the claim Values Socio-political 22 56 Towards the end of 2007 – mid 2009 Both Nepali National and International 22 108 52 Table 10: Current statistics of the corpus annotation We plan to annotate some 300 additional arguments from our compiled corpus. The annotated corpus would be used for our work of analyzing opinions and argumentation in editorials both as a training and test data. Further, we would be possibly extending our annotation scheme and consequently the annotation work keeping in mind the different argumentation schemes in news editorials and their varying roles in the analysis of the changes of opinions over time. The extension also has a significant role in the synthesis of arguments from one or several editorial sources from the same or similar dates on a common topic which we plan to do in future 9. Perspectives We have presented in this paper the basic linguistic elements and annotation schemas for dealing with the analysis of opinions for or against a given controversial issue. The work is based on news editorials in English. This work has a number of very challenging issues that we need to address, among which: - How to define a relatedness metrics (or other means) that would indicate, given a controversial issue, if a portion of an editorial addresses it or not, - Identify arguments, and attacks or contradictions between arguments, Elaborate a way to summarize the results, possibly over a large time span. Summary can be graphic, as illustrated below or textual. Finally, elaborate on how to evaluate such as system: accuracy, portability, etc. and what population it is applicable to. Acknowledgements We would like to thank Prof. Patrick Hall for his continuous support and inspiration for this work. This work was partly supported by the French Stic-Asia program. Thanks are also to Kathmandu University and Madan Puraskar Pustakalaya, Nepal, for the support to this work. References Carlson, L., D. Marcu, and M. Okurowski, (2001) "Building a Discourse-tagged Corpus in the Framework of Rhetorical Structure Theory," in In Proceedings of the Second Sigdial Workshop on Discourse and Dialogue – Volume 16. Annual Meeting of the ACL. Association for Computational Linguistics, Morristown, NJ, Aalborg, Denmark, pp. 1-10. Dunworth, K.. (2008) UniEnglish reading: distinguishing facts from opinions. [Online]. "http://unienglish.curtin.edu.au/local/docs /RW_facts_opinions.pdf" Kolflaath, E., (2007) "The Strength of Arguments," Dissertation, Department of Philosophy and Faculty of Law, University of Bergen. Martin J., and P. R. White, (2005) The Language of Evaluation: Appraisal in English. London: Palgrave, Macmillan. Read, J., D. Hope, and J. Carroll, (2007), Annotating Expressions of Appraisal in English, in Proceedings of the ACL 2007 Linguistic Annotation Workshop, Prague, Czech Republic. Stoyanov V., and C. Cardie, (2004) Evaluating an Opinion Annotation Scheme Using a New Multi-Perspective, in Computing Attitude and Affect in Text: Theory and Practice.: Springer, pp. 77-89. TaboadaM., and J. Renkema, Discourse Relations Reference Corpus. Simon Fraser University and Tilburg University, 2008. Toulmin, S.E., (2008) The Uses of Arguments, Updated Edition ed. New York, The United States of America: Cambridge University Press. Wiebe, J., T. Wilson, and C. Cardie, (2005) Annotating Expressions of Opinions and Emotions in Language, Language Resources and Evaluation, vol. I, no. 2. Wilson,T.,(2003) Annotating Opinions in World Press, In Proceedings of SIGdial-03, 2003, pp.13-22. Wilson, T., (2005) Annotating Attributions and Private States, In Proceedings of the ACL 2005 Workshop: Frontiers in Corpus Annotation II: Pie in the Sky, pp. 53-60. 1157 Figure 1: A manual diagrammatic analysis of the argumentation structure of an editorial excerpt Figure 2: A diagrammatic analysis of an argumentation structure of an editorial excerpt using Athena software 1158
© Copyright 2026 Paperzz