Towards Analyzing Opinions in News Editorials

2009 Eighth International Symposium on Natural Language Processing
Who Speaks for Whom? Towards Analyzing Opinions in News
Editorials
Bal Krishna Bal and Patrick Saint-Dizier
o
unnecessarily have to go through all of them, yet get a vivid
picture of the happenings or events. Even better would have
been the case, if there were a mechanism to track changes in
opinion across editorials over a common topic with time.
The proposed work aims to build a framework and more
precisely a computational linguistic model that would
suggest appropriate techniques and methods for analyzing
the editorials and constructing a synthesis. At the moment,
we have basically identified the different linguistic
components and are in the process of working towards
specifying the different underlying computational
procedures required for the model.
The organization of the paper is as follows. In section I,
we introduce our problem, state the research aims and
briefly talk about the current status of the work. In section II,
we shed light on Opinion Mining and discuss on the
different sub problems under the larger problem. We also
correlate the association of these sub problems with our
problem of editorial analysis and synthesis. In section III, we
give an overview of the related works and also throw light
on the novelties that our work carries. Moving on to section
IV, we discuss on the linguistic basis for distinguishing facts
and opinions. In section V, we talk on the linguistic aspects
for determining the strength of opinions. Similarly in section
VI, we throw light on one of the crucial components of our
research work – outlining the argumentation structure of
editorials (support and rhetorical relations). We also briefly
discuss on the semantic tagset employed for the purpose of
annotation. In section VII, we report our ongoing works on
editorials collection and annotation.
Abstract—The paper discusses on the ongoing work of
editorial analysis and synthesis construction, basically text
annotation and the linguistic criteria for distinguishing between
facts and opinions. Further we also talk on the factors playing a
crucial role in determining the strengths of opinions. We also
discuss on the process of argumentation structure outlining, a
major part of our work that directs the analysis of opinions in
the discourse level.
I. INTRODUCTION
W
ITH the increasing interest of the general public
towards socio-political happenings, it is a growing
practice these days to read and analyze the different opinions
on a particular event published by the media in the form of
editorials. Such an analysis would not only help to
understand how a particular event has been perceived by
different media sources but also provide a relatively true
view of the happenings and hence is of primary interest to
journalists, public figures and political analysts. The online
electronic resource http://www.nepalmonitor.com for
instance, includes the editorials from different national and
international newspapers organized on a monthly basis.
These editorials basically talk on some of the prime events
that have taken place in Nepal in a particular month. The
editorial sources in the link provided above range from
Voice of America, The Japan Times, The Washington
Times, The New Nation – Bangladesh, Dawn, Gulf News –
UAE, The Himalayan Times, The Kathmandu Post, The
Hindu – India, Times of India , The Indian Express and
Economic Times – India. It is indeed interesting to see how
these editorials differ in opinions, how convincing or
persuasive the arguments appear in providing supports to
certain conclusion(s) and if possible judge the different
degrees of biases and prejudices evident in them. These
problems are quite difficult even to humans, let alone the
machine. From an automation perspective, it would have
been a good thing if there were a provision for constructing a
synthesis of the different opinionated arguments (Positive,
Negative and Neutral) in one document with some useful
information like (source, date, orientation etc. of the
editorial) clearly mentioned so that the readers need not
II. OPINION MINING AS A PROBLEM
Although Opinion Mining has emerged only quite
recently as a subdiscipline under computational linguistics, a
considerable amount of work has already been done in this
direction. These works range from a variety of task domains
like mining the product reviews available on the web,
sentiment classification of documents, opinion mining and
summarization to much more. Irrespective of the nature of
different specific tasks, Opinion Mining generally
encompasses the following generic problems:
1. Determining the subjectivity or identifying the
subjective and objective expressions in texts [1, 2, 7].
2. Determining the orientation or polarity of the
subjective expressions [3, 4, 5, 6, 11].
3. Determining the strength of the orientation of the
subjective expressions [8]. This involves deciding
Manuscript received August 8,2009
B. Krishna Bal is with the Madan Puraskar Pustakalaya, Lalitpur, Patan
Dhoka, Nepal (phone: 977-1-5521393; fax: 977-1-5536390; e-mail: bal@
mpp.org.np).
P. Saint Dizier is with IRIT, 118 Narbonne 31062 Toulouse, France.
(e-mail: [email protected]).
978-1-4244-4139-6/09/$25.00 ©2009 IEEE
227
whether the Positive or Negative opinion expressed in
texts is Weakly Positive/Negative, Mildly
Positive/Negative or Strongly Positive/Negative.
are characteristic for the presence of certain verbs like
“declare” and different tense and number forms of the verb
“be” etc. Moreover, statements interpreted as facts are
generally accompanied by some reliable authority providing
the evidence of the claim. Below in Table 1, two examples
of factual sentences and their respective sources of evidence
are presented.
Our problem of analyzing editorials essentially involves
all of the problems 1-3 above. Additionally, it also requires
opinion analysis in the higher levels, i.e. the discourse level.
We will be discussing about this in later sections.
Table 1. Facts and reliable authority
III.
RELATED WORKS
Although our work belongs to the general class of
Opinion Mining, it indeed also encompasses the analysis of
the Argumentation Structure in editorial texts, which
follows from the Argumentation Theory. From the
Opinion Mining perspective, our work is mostly close to that
of [9,10,11] While [9] employ major topic detection and the
concept of relevant sentences for opinion analysis and
summarization, [10] additionally talk about opinion tracking
using sentiment scores. Similarly, [11] talk about finding out
primarily Opinion Holders and the topic expressed in online
News Media Text.
Clearly, all of these only partially address our problem.
Talking about the Argumentation Theory, there has been
work in the AI and Law community which have looked at
the annotation of legal texts using argumentation schemes.
[12,13], for instance, which make use of argumentation
schemes, which are a popular way of categorizing certain
patterns of arguments appropriate to contexts. Similarly [14]
looks at strength of arguments based on considerations of the
perspectives held by the audience to whom the argument is
addressed. [15] deals with the analysis of the structure of
argumentative discourse producing a model for the same.
Hence, what follows from the above is that, although works
that partially address our problem exist in more than one
research domains, practically to the best of our knowledge,
no works are known today, which embrace the two fields –
Opinion Mining and Argumentation Theory altogether for
the analysis and construction of a synthesis of opinion
arguments from one or more editorials over a common topic.
Our work does not. Our work takes its basics for opinion
identification and extraction as well as strength
determination in the word and phrase level from Opinion
Mining, whereas the opinion analysis in the discourse level
is conducted on the basis of the Argumentation Theory thus
analyzing the argumentation structure found in editorials.
Facts
Reliable authority
Both the two dates announced
for the constituent assembly
(CA) elections came and went
without the vote taking place.
We have fewer people getting
killed every day.
Election Commission
for CA elections 2007.
Nepal Police
Department of Crime
and Investigation.
(December 2007)
Opinions, on the other hand, are characterized by the
evaluative expressions of various sorts such as the following
[17]:
a) Presence of evaluative adverbs and adjectives in
sentences – “ugly” and “disgusting”.
b) Expressions denoting doubt and probability – “may
be”, “possibly”, “probably”, “perhaps”, “may”,
“could” etc.
c) Presence of epistemic expressions – “I think”, “I
believe”, “I feel”, “In my opinion” etc.
It is obvious that the distinction between the two is not
always straightforward. Facts could well be opinions in
disguise and, in such cases, the intention of the author as
well as the reliability of information needs to be verified.
In order to make a finer distinction between facts and
opinions and within opinions themselves, opinions are
proposed for gradation as shown below in Table 2.
Table 2. Gradation of opinions
Opinion type
Hypothesis statements
Theory statements
Global definition
Explains an observation.
Widely
believed
explanation
Assumptive statements
Improvable predictions.
Value statements
Claims based on personal
beliefs.
Exaggerated statements
Intended to sway readers.
Attitude statements
Based on implied belief
system.
Source:[www.clc.uc.edu/documents_cms/TLC/Fact_and_Op
inion.ppt]
IV. LINGUISTIC BASIS FOR DISTINGUISHING FACTS AND
OPINIONS
Since editorials are usually a mix of facts and opinions,
there is clearly a need to make a distinction between them.
Opinions often express an attitude towards something. This
can be a judgment, a view or a conclusion or even an
opinion about opinion(s). Different approaches have been
suggested to distinguish facts from opinions. Generally, facts
For the purpose of developing a linguistic base in order to
identify opinions (opinion words or phrases) in texts, we
maintain a Polarity lexicon with opinion words and
expressions collected from the corpus categorized into
228
prototypically positive and negative sets. Next, by consulting Downtoners
Kind of: sort of, kinda, rather, to some
extent, almost, all but
the available electronic resources like the dictionary,
Mildly: gently
thesaurus and even the WordNet, we manually increase the
size of the lexicon by introducing synonyms to the already Source:[www.grammar.ccc.commnet.edu/grammar/adverbs.h
compiled entries from the corpus. This gives the opportunity tm]
of compiling a rich collection of opinions – both context
We include an example for each of the above categories
dependent (phrases from the corpus) and context
independent (words from the dictionary and other of the intensifiers and their role in changing the strength of
resources). Moreover, as part of the lexicon building, we opinions.
group semantically similar members within the bigger sets
Bad – Low, Really bad – High
into smaller subsets. Below in Table 3, we provide a sample
Quiet
– Low, Absolutely quiet – High
of the polarity lexicon.
Friendly – Average, Sort of friendly - Low
Table 3. Polarity lexicon
Similarly, in Table 6, we present a sample of the premodifiers
and show their contribution to the overall
Positive
Negative
strengths
of
the expressions.
PeaceInfamy –
{peace(n),peaceful(adj),
{infamy(n),discredit(n),
Table 5. Pre-modifier lexicon
accord(n),pact(n),treaty(n),
disrepute(n),notoriety(n),
pacification(n),pacify(v),
infamous(n),dishonor(n),
Adverb/Adjective PreStrength
peacefulness(n),serenity(n)} notorious(adj)}
Strength
modifier
HappyHeight of impunity, drama of
Fast (Low)
Very
Very fast (High)
{happy(adj),happiness(n),
consensus.
Careful(Low)
Lot
Lot more careful (Average)
felicitous(adj),glad(adj),
more
willing(adj),happiness(n),
Better (Average)
Much
Much better (High)
felicity(n)}
Serious (Low)
Much much better (High)
Much more serious (High)
V. STRENGTH OF OPINIONS
Good (Low)
SomeSomewhat good (Average)
what
Besides detecting the polarity of opinions as Positive,
Quite
Quite good (Average)
Negative or Neutral, it is equally important to determine the
strength of the opinions (Weak, Strong, Mildly Weak, Source:[www.grammar.ccc.commnet.edu/grammar/adjective
Mildly Strong etc.) present in text. For this purpose, we have s.htm#a-_adjectives]
developed the Intensifier and Pre-modifier lexicons, which
basically consist of adverbs and pre-modifiers. The latter
We are also currently working on the report and modal
come in front of adverbs and adjectives. Both the intensifiers verbs and their respective roles in determining the strengths
and pre-modifiers play a role in conveying a greater and/or of opinions. Their precise contribution for this purpose is
lesser emphasis to something. Intensifiers are reported to still subject to further study.
have three different functions – emphasis, amplification and
downtoning. In Table 4, we present a sample of the
intensifiers.
VI. OUTLINING THE SUPPORT AND RHETORICAL
RELATIONS IN EDITORIALS
Table 4. Intensifier lexicon
Type
Emphasizer
Amplifiers
From the opinion mining and analysis in the word, phrase
and sentence level, we now move to higher levels of
analysis, i.e. the discourse level, style of writing, political
affiliations of the editorials and so on. And then here
exactly comes the necessity for outlining the argumentation
structure of editorials. Editorials consist of an argumentation
structure consisting of the conclusion statement, which is in
turn supported by other statements (also known as the
supports) for or against the conclusion. These supports as
well as the conclusion can be either facts or opinions. The
supports may be further developed by means of text
fragments, also widely known as rhetorical relations.
We have been working towards analyzing the
argumentation structure of editorials thus determining the
Value
Really: truly, genuinely, actually.
Simply: merely, just, only, plainly.
Literally
For sure: surely, certainly, sure, for
certain, sure enough, undoubtedly.
Of course: naturally.
Completely: all, altogether, entirely,
totally, whole, wholly.
Absolutely: totally and definitely, without
question, perfectly, utterly.
Heartily: cordially, warmly, with gusto and
without reservation.
229
study and analysis of the raw corpus from the perspective of
opinion analysis and argumentation outlining. It combines
the aspects of both Opinion Mining and the Argumentation
Theory, thus clearly conforming to our needs.
persuasiveness inherent in texts. The result is a discourse
analysis of opinions producing some sort of semantic
representation. Ultimately, the analyzed argumentation
structure would be used to construct a synthesis of positive
and negative arguments from one or several editorials
(single or multiple sources) over a common date or a span of
time.
Table 6: Semantic tagset
Parameters
Argument_type
In our semantic and pragmatic representation of editorials,
the root node is a conclusion. The conclusion consists of the
attributes – polarity (Positive, Negative or Neutral), date,
source. Next, the root node is associated with one or more
supports. Similarly, the support relations have the attributes
– date, source, orientation of support (for or against),
reporting level (characterized by report verbs and modal
verbs expressing different levels of commitments),
conditional level (Yes or No indicating the support’s
association with some other supports), and strength of the
argument (in terms of direct, relative and persuasion effect).
Expression_type
Fact_authority
Opinion_orientation
Orientation_support
Id
Date
Source
Commitment
Conditional
Direct-strength
Relative-strength
Persuasion-effect
Rhetoric_relation type
Next, for our purpose of editorial analysis, we have used
the following rhetorical relations (Marcu, 1997):
• Exemplification: illustrates a support, while
giving it a higher strength and persuasion effect.
• Contrast: relates two supports A and B, where A
and B are both true while partly contradicting
each other. They are in general linked by
connectors such as nevertheless, although, but,
even, if etc.
• Discourse frame: introduces a factual statement
which indicates the environment and scope of the
conclusion (time, facts etc.)
• Justification: where B gives reasons and explains
A. This relation is stronger than the explanation
relation.
• Elaboration: where B is an elaboration of A if it
develops or describes a part of A.
• Paraphrase: which is just another way of stating
the support or conclusion, adding strength to the
statement.
• Cause-effect: establishes a causal relationship
between supports.
• Result: where B results at least partly or
indirectly from A.
• Explanation: where B is an explanation for A if it
indicates the reasons for A in a quite neutral way.
• Reinforcement: where B reinforces A. It is
stronger than an elaboration, an exemplification
or an explanation. In general, it contains specific
marks related to confirmation, enforcement etc.
Possible values
Support,
Conclusion,
Rhetorical_relation
Fact, Opinion, Undefined
Yes, No
Positive, Negative, Neutral
For,Against
Id number of the support
Date of publication of the
editorial
Source or name of the newspaper
Modal, Low, High
Yes, No
Low, Average, High
Low, Average, High
Low, Average, High
Exemplification,
Contrast,
Discourse, Frame, Justification,
Elaboration, Paraphrase, Causeeffect,
Result,
Explanation,
Reinforcement
Conclusion:(<Date:2007-12-28>,<Source:KTMPOST>,
<Orientation: Positive>, <Strength: High>)
[CA election] will take place in 2008.
Support:(<ID:1>,<Date:2007-12-28>,<Source:
KTMPOST>, <Orientation: Positive, Support Type: For>,
<Strength: Low>)
The Post believes that the long awaited and ever elusive [CA
elections] will take place this year.
Rhetorical_relation: Justification(1,2)
//Support 2 is a Justification of Support 1
Support: (ID:2>, <Date:2007-12-28>
,<Source:KTMPOST>,<Orientation:Positive, Support Type:
Conditional, For>, <Strength:Low>)
If we behave responsibly, we will be able to hold the [CA
elections].
Rhetorical_relation: Justification(1,3)
// Support 3 is a Justification relation of Support 1.
Below, we present an example of the argumentation
structure from our corpus of editorials. We have partly used
the semantic tagset that we have developed (see Table 6) in
defining the argumentation structure. The semantic tagset,
which is now more or less stabilized, is a result of a careful
Support:(<ID:3>, <Date:2007-12-28>
,<Source:KTMPOST>,<Orientation:Positive, Support Type:
Conditional, For>, <Strength:Low>)
230
If the Maoists do not run away from elections, if the recently
formed and old parties of the terai live up to the promises to
allow [elections] happen…
In the example above, the conclusion is characterized by a
vector that contains id, date, source, orientation and strength.
The conclusion is followed by supports and rhetorical
relations. Supports are also described in the same manner as
the conclusion. The referential expression is put inside
square brackets which bind the supports to the event
reported in the conclusion. Hence [CA elections] and
[elections] are the referential expressions in the example
above. Similarly, the underlined text portions above are the
opinion anchors, i.e. those terms that a priori mark the
statement as an opinion. For the strength, we are currently
only considering the attribute, direct-strength. Other two
attributes would be gradually incorporated.
Rhetoric_relation type. It was found that the disagreement
level was most frequent for the tag Expression_type (one in
every five tagged words) followed by Opinion_orientation
(one in every ten tagged words), Orientation_support (one in
every fifteen tagged words) and so on. The disagreements
were resolved by mutual discussions as well as consultations
with linguist experts.
Below in Fig.1, we present a diagrammatic representation
of the argumentation structure for an editorial using the
Athena software available at http://www.athenasoft.org
Rhetorical relations in the above case further develop the
supports and are characterized by links existing between
supports.
VII. TEXT COLLECTION AND ANNOTATION
With an aim to develop a training data for tagging and
then after analyzing the editorial texts, editorials have been
collected from at least three different sources. The collected
texts serve as a corpus for our research work. The editorials
represent a common theme – Socio-political and subtheme
Peace and stability and are taken from different dates
towards the end of the year 2007 and the beginning of 2008
amounting to a total of 300 plus text files, with a total of
6000 sentences and an average of 20 sentences per editorial.
The texts are taken respectively from The Kathmandu Post
Daily, http://ekantipur.com/ktmpost.php, The Nepali Times
Weekly, http://nepalitimes.com.np and The Spotlight Weekly,
http://nepalnews.com/spotlight.php. We plan to extend the
collection by including editorials from both national and
international newspapers covering a wide range of domains
like society, culture, health, education etc. The collected
texts have been annotated by two annotators having a fairly
good understanding of the English language. The annotators
have been assigned the same texts to see how semantic
annotations can differ among annotators. Results have
shown that the difficulties in the manual annotation exist at
two levels, the first one in determining the orientation of
polarity of words or expressions and the second one in
evaluating their strengths for three different strength
attributes – direct-strength, relative-strength and persuasionstrength. Wherever, the annotators have confusions about
providing one particular value, they have been instructed to
provide multiple values separated by commas. The
annotations made by the annotators were then exchanged
with each other for peer review basically to determine the
inter-annotator disagreement rates. The disagreements were
basically noted for the tagging the text thus picking the
values
for
the
attributes
–
Expression_type,
Opinion_orientation, Orientation_support, Commitment,
Direct-strength, Relative-strength, Persuasion-effect and
231
Fig.1.
Diagrammatic
argumentation structure
representation
of
the
In the diagrammatic representation above, the topmost
node is the conclusion followed by child nodes below. The
ones highlighted partially in green represent positive
supports whereas the ones in red are counterarguments or
negative supports to the conclusion. The text in the yellow
box represents a detailed information on each node, in our
case the different attribute-value pairs for the attributes
(date, source, orientation of support, strength etc.), which
can be entered while developing the diagram and that can be
read by moving the mouse cursor to the respective node.
ACKNOWLEDGMENT
We would like to thank Prof. Patrick Hall for his
continuous support and inspiration for this work. This work
was partly supported by the French Stic-Asia program.
Thanks are also due to Madan Puraskar Pustakalaya, Nepal
for the support to this work.
REFERENCES
[1] J. M. Wiebe, R. F. Bruce, and T. P. O'Hara,
"Development and use of a gold-standard data set for
subjectivity classifications ," in Proceedings of the 37th
annual meeting of the Association for Computational
Linguistics on Computational Linguistics , College
Park, Maryland , 1999.
[2] V. Hatzivassiloglou and J. M. Wiebe, "Effects of
adjective orientation and gradability on sentence
subjectivity.," in Proceedings of the 18th Conference on
Computational Linguistics - Volume 1 , Saarbrücken,
Germany, 2000, pp. 299-305.
[3] V. Hatzivassiloglou and K. R. McKeown, "Predicting
the Semantic Orientation of Adjectives," in In
Proceedings of the 35'th Annual Meeting of the ACL
and the 8'th Conference of the European Chapter of the
ACL, pages 174-181, Madrid,Spain, 1997.
[4] P. D. Turney, "Thumbs Up or Thumbs Down? Semantic
Orientation Applied to Unsupervised Classification of
Reviews," in Proceedings of the 40th Annual Meeting of
the Association for Computational Linguistics (ACL),
Philadelphia, 2002.
[5] A. Esuli and F. Sebastiani, "Determining the semantic
orientation of terms through gloss classification," in
Proceedings of the 14'th ACM international conference
on information and knowledge Management, Bremen,
Germany, 2005.
[6] B. Pang, L. Lee, and S. Vaithyanathan, "Thumbs up?:
sentiment classification using machine learning
techniques ," in Proceedings of the ACL-02 conference
on Empirical methods in natural language processing Volume 10 , 2002.
[7] E. Riloff, J. Wiebe, and T. Wilson, "Learning subjective
nouns using extraction pattern bootstrapping," in
Proceedings of the Seventh Conference on Natural
Language Learning (CoNLL-03), 2003.
[8] T. Wilson, J. wiebe, and P. Hoffman, "Recognizing
contextual polarity in phrase-level sentiment analysis.,"
in Proceedings of the conference on Human Language
Technology and Empirical Methods in Natural
Language Processing, Vancouver, British Columbia,
Canada, 2005.
[9] L.-W. Ku, L.-Y. Lee, and T.-H. Wu, "Major topic
detection and its application to opinion summarization.
," in SIGIR 2005, 2005, pp. 627-628.
[10] L. Ku, Y. Liang, and H. Chen, "Opinion extraction,
summarization and tracking in news and blog corpora,"
in Proceedings of AAAI-2006 Spring Symposium on
Computational Approaches to Analyzing Weblogs,
2006.
[11] S.-M. Kim and E. Hovy, "Extracting Opinions, Opinion
Holders, and Topics Expressed in Online News Media
Text," in Proceedings of ACL/COLING Workshop on
Sentiment and Subjectivity in Text, Sidney, AUS, 2006.
[12] M.-F. Moens, E. Boiy, R. M. Palau, and C. Reed,
"Automatic detection of arguments in legal texts ," in
Proceedings of the 11th international conference on
Artificial intelligence and law , Stanford, California ,
2007 , pp. 225-230.
[13] D.N.Walton, Argumentation Schemes for Presumptive
Reasoning. Mahwah,NJ: Lawrence Erlbaum Associates,
1996.
[14] T.J.M.Bench-Capon, "Agreeing to Differ:Modelling
Persuasive Dialogue Between Parties Without a
Consensus About Values," Informal Logic, vol. 22, no.
3, pp. 231-45, 2002.
[15] R. Cohen, "Analyzing the structure of argumentative
232
discourse," Comput. Linguist, vol. 13, no. 0891-2017,
pp. 11-24, 1987.
[16] D. Marcu, "The rhetorical parsing of natural language
texts ," in Proceedings of the 35th Annual Meeting of
the Association for Computational Linguistics and
Eighth Conference of the European Chapter of the
Association for Computational Linguistics , Madrid,
Spain , 1997 , pp. 96-103.
[17] K.
Dunworth.
(2008)
UniEnglish
reading:
distinguishing
facts
from
opinions.
http://unienglish.curtin.edu.au/local/docs/
RW_facts_opinions.pdf