Overview: Sentiment and Subjectivity Analysis

Sentiment and Subjectivity
Analysis
An Overview
Alina Andreevskaia
Department of Computer Science
Concordia University
Montreal, Quebec
Canada
Nancy Ide
Department of Computer Science
Vassar College
Poughkeepsie, New York
USA
Definition
• Sentiment Analysis
– Also called Opinion Mining
– Classify words/senses, texts,
documents according to the opinion,
emotion, or sentiment they express
• Applications
– Determining critics’ opinions of
products
– Track attitudes toward political
candidates
– etc.
Sub-Tasks
• Determine Subjective-Objective polarity
– Is the text or language factual or an
expression of an opinion?
• Determine Positive-Negative polarity
– Does the subjective text express a positive
or negative opinion of the subject matter?
• Determine the strength of the opinion
– Is the opinion weakly positive/negative,
strongly positive/negative, or neutral?
History
• Current work stems from
– Content analysis
• “analysis of the manifest and latent content of
a body of communicated material (as a book or
film) through classification, tabulation, and
evaluation of its key symbols and themes in
order to ascertain its meaning and probable
effect” (Webster’s Dictionary of the English
Language, 1961)
• Long history
– Quantitative newspaper analysis (1890-on)
– Lasswell, 1941: study of political symbols in
editorials and public speeches
– Gerbner, 1969: establish “violence profiles” for
different TV networks; trace trends; see how
various groups portrayed
Content Analysis
• Psychology and sociology
– Analysis of verbal patterns to determine
• motivational, mental, personal characteristics
(1940’s)
• group processes
• cultural commonalities/differences (Osgood, Suci,
and Tannebaum, 1957, semantic differential scales)
– Major contribution: General Inquirer
• http://www.wjh.harvard.edu/~inquirer/
• Anthropology
– Study of myths, riddles, folktales
– Analysis of kinship terminology (Goodenough, 1972)
• Literary and Rhetorical analysis
– Stylistic analysis (Sedelow and Sedelow, 1966)
– Thematic analysis (Smith, 1972; Ide, 1982, 1989)
Other Roots
• Point of view tracking in
narrative (Banfield, 1982;
Uspensky, 1973; Wiebe, 1994)
– Subjectivity analysis
• Affective Computing (Picard, 1997)
– Develop means to enable computer to
detect and appropriately respond to
user's emotions
• Directionality (Hearst, 1992)
– Determine if author is positive,
neutral, or negative toward some part
of a document
History
• Late ‘90’s: first automatic
systems implemented for NLP
– Spertus, 1997
– Wiebe and Bruce, 1995
– Wiebe et al., 1999
– Bruce and Wiebe, 2000
• Now a major research stream
Current work in NLP
• Sentiment tagging
– Assignment of positive, negative, or
neutral values/tags to texts and its
components
– Began with focus on binary (positivenegative) classification
– Recently, include neutrals
• Little work on other types of
affect
– Still a focus in much content
analysis work in other fields
Other Work
•
•
Assignment of fine-grained affect labels based on various
psychological theories (Valitutti et al., 2004; Strapparava
and Mihalcea, 2007)
Detection of
–
–
–
–
–
–
–
•
•
•
opinion holders (Kim and Hovy, 2004; Kim and Hovy, 2005; Kim and
Hovy, 2006; Choi et al., 2005; Bethard et al., 2004; Kobayashi
et al., 2007)
opinion targets (Hurst and Nigam, 2004; Gamon and Aue, 2005; Hu
and Liu, 2004; Popescu and Etzioni, 2005; Kim and Hovy, 2006;
Kobayashi et al., 2007)
perspective (Lin et al., 2006)
pros and cons in reviews (Kim and Hovy, 2006a)
bloggers’ mood (Mishne and Glance, 2006; Mishne, 2005; Leshed
and Kaye, 2006)
happiness (Mihalcea and Liu, 2006)
politeness (Roman et al., 2005)
Assignment of ratings to movie reviews (Pang and Lee, 2005)
Identification of support/opposition in congressional
debates (Thomas et al., 2006)
Prediction of election results (Kim and Hovy, 2007)
Subjectivity
• Focuses on determining
subjective words and texts
that mark the presence of
opinions and evaluations vs.
objective words and texts,
used to present factual
information (Wiebe, 2000;
Wiebe et al., 2004; Wiebe and
Riloff, 2005)
Terminology
• Many terms
– Sentiment classification, sentiment
analysis
– Semantic orientation
– Opinion analysis, opinion mining
– Valence
– Polarity
– Attitude
• Here, use the term sentiment
Theories of Emotion and
Affect
• Osgood's semantic differential (Osgood,
Suci, and Tannenbaum, 1957)
– Three recurring attitudes that people use to
evaluate words and phrases
• Evaluation (good-bad)
• Potency (strong-weak)
• Activity (active-passive)
• Ortony's salience-imbalance theory
(Ortony, 1979)
– defines metaphors in terms of particular
relationships between topic and vehicle
Theories of Emotion and
Affect
• Martin’s Appraisal Framework
– http://www.grammatics.com/appraisal/
• Three sub-types of attitude
– Affect (emotion)
• evaluation of emotional disposition
– Judgment (ethics)
• normative assessments of human behavior
– Appreciation (aesthetics)
• assessments of form, appearance, composition,
impact, significance etc of human artefacts and
individuals
Theories of Emotion and
Affect
• Elliot’s Affective Reasoner
– http://condor.depaul.edu/~elliott/ar.html
GRO UP
We ll-Being
Fortune s-ofOthers
Prospe ct-based
Confirmati on
Attribution
Attraction
We ll-being/
Attribution
Attraction/Attribu
tion
SPE CIFICATI ON
CATE GO RY LABEL
AN D EM OTION TYPE
appr aisal of a situation as an
joy, distress
event
presumed value of a situation as happy, gloating,
an event affecting another
resen tmen t, jealous y,
env y, sorr y-for
appr aisal of a situation as a
hop e, fear
prospec tive eve nt
appr aisal of a situation as
satis faction, relief, fear sconfirmi ng or disco nfirming an con firmed,
expe ctation
disappoin tmen t
appr aisal of a situation as an act prid e, admiration, shame,
of some age nt accounta ble
repr oach
appr aisal of a situation as
liking, dislik ing
cont aining
an a ttractive or unattractive
obje ct
compound emo tions
gra titud e, ange r,
gra tification, remorse
compound emo tion exten sions lov e, hate
Theories of Emotion and
Affect
• Ekman’s basic emotions
– Ekman’s work revealed that
facial expressions of emotion
are not culturally determined,
but universal to human culture
and thus biological in origin
– Found expressions of anger,
disgust, fear, joy, sadness, and
surprise to be universal
• Some evidence for contempt
Resource Development
• Semantic properties of individual
words are good predictors of
semantic characteristics of a
phrase or a text that contain them
• Requires development of lists of
words indicative of sentiment
Manually-created Lists
• General Inquirer
(GI)
– Best known extensive list of words categorized for
various categories
– Developed as part of content-analysis project
(Stone et al., 1966; Stone et al., 1997)
– Three main word lists:
• Harvard IV-4 dictionary of content-analysis categories
(includes Osgood’s three dimensions of value, power and
activity)
• Lasswell’s dictionary
– eight basic value categories (WEALTH, POWER, RESPECT, RECTITUDE,
SKILL, ENLIGHTENMENT, AFFECTION, WELLBEING) plus other info
• Five categories based on social cognition work of Semin and
Fiedler (1988)
– Verb and adjective types
– Some words tagged for sense
– Recognized as a gold standard for evaluation of
automatically produced lists
WordNetAffect
• Developed semi-automatically by Strapparava and
colleagues
– assigned affect labels to words in WordNet
– expanded the lists using WordNet relations such as
synonymy, antonymy, entailment, hyponymy
• Includes
– semantic labels based on psychological and social
science theories (Ortony, Elliot, Ekman)
– valence (positive or negative)
– arousal (strength of emotion)
• 2004 version covers 1314 synsets, 3340 words
• Part of WordNet Domains
– http://wndomains.itc.it/
Others
• Whissell’s Dictionary of Affect in
Language (DAL) (Sweeney and
Whissell, 1984; Whissell, 1989;
Whissell and Charuk, 1985)
• Affective Norms for English Words
(ANEW) (Bradley and Lang, 1999)
• Sentiment-bearing adjectives by
Hatzivassiloglou and McKeown
(1997)
• Sentiment and subjectivity clues
from work by Wiebe
Caution
• Limitations of manually
annotated lists
– limited coverage
– low inter-annotator agreement
– diversity of the tags used
Automatically-created
Lists
Corpus-based methods
• Hatzivassiloglou and McKeown (1997) (HM)
– builds on the observation that some linguistic
constructs, such as conjunctions, impose
constraints on the semantic orientation of their
constituents
– clustered adjectives from the Wall Street Journal
in a graph into positive and negative sets based
on the type of conjunction between them
– cluster with higher average frequency was deemed
to contain positive adjectives, lower average
frequency meant negative sentiment
• Limitations
– algorithm limited to adjectives (also adverbs -Turney and Littman, 2002)
– requires large amounts of hand-labeled data to
produce accurate results
Web As Corpus
• Peter Turney (Turney, 2002; Turney and
Littman, 2002; Turney and Littman,
2003)
– More general method, does not require
previously annotated data for training
– Induce sentiment of a word from the strength
of its association with 14 seed words with
known positive or negative semantic
orientation
– Two methods for association:
• Point-wise mutual information
• Semantic latency
– Used web as data
• ran 14 queries on AltaVista using NEAR operator
to acquire co-occurrence statistics with the 14
seed words
Turney, con’t
• Results evaluated against GI on a
variety of test settings gave up to
97.11% accuracy for top 25% of words
• Size of the corpus had a considerable
effect
– 10-million word corpus instead of the full
Web content reduced accuracy to 61.26-68.74%
• LSA performed relatively better than
PMI on 10m word corpus
– LSA more complex, harder to implement
End of An Era
• Due to its simplicity, high
accuracy and domain independence,
the PMI method became popular
• In 2005, AltaVista discontinued
support for NEAR operator, upon
which the method relied
• Attempts to substitute NEAR with
AND led to considerable
deterioration in system
performance
Other Approaches
• Bethard et al. (2004)
– Used two different methods to acquire
opinion words from corpora
• calculated frequency of co-occurrence with seed
words taken from Hatzivassiloglou and McKeown,
computed the log-likelihood ratio
• computed relative frequencies of words in
subjective and objective documents
– First method produced better results for
adverbs and nouns, gave higher precision but
lower recall for adjectives
– Second method worked best for verbs
Other Approaches
• Kim and Hovy (2005)
– separated opinion words from nonopinion words by computing their
relative frequency in subjective
(editorial) and objective (noneditorial) texts from TREC data
• Riloff et al. (2003), Grefenstette
et al. (2006)
– Used syntactic patterns
– Learn lexico-syntactic expressions
characteristic for subjective nouns
Automatically-created
Lists
Dictionary-Based Methods
• Addresses some of the limitations
of corpus-based methods
• Use semantic resources such as
WordNet, thesauri
• Two approaches:
– Rely on thesaural relations between
words (synonymy, antonymy, hyponymy,
hyperonymy) to find similarity
between seed words and other words
– Exploit information contained in
definitions and glosses
Use of WordNet
• Kim and Hovy (2004,2005)
– Extended word lists by using
WordNet synsets
– Ranked lists based on sentiment
polarity assigned to each word
in synset based on WordNet
distance from positive and
negative seed words
• Similar approach used by Hu
and Liu (2004)
SentiWordNet
• Developed by Esuli and
Sebastiani
– Trained several classifiers to
give rating for positive,
negative, objective for each
synset in WordNet 2.0
• Scores from 0 - 1
– Freely available
• http://sentiwordnet.isti.cnr.it
Beyond Synsets
• Kamps et al. (2004)
– Tagged words with Osgood’s three
semantic dimensions
– Computed shortest path through
WordNet relations connecting a word
to words representative of the three
categories (e.g., good and bad for
evaluation)
• Esuli and Sebastiani (2005)
– Classified words in WordNet into
positive and negative based on
synsets, glosses and examples
Further Beyond Synsets
• Andreevskaia and Bergler (2006)
– Take advantage of the semantic
similarity between glosses and head
words
– Start with a list of manually
annotated words, expand with synonyms
and antonyms, search WordNet glosses
for occurrences of seed words
– If a gloss contains a word with known
sentiment, head word deemed to have
same sentiment
– Suggest that overlap measure reflects
the centrality of the word in the
sentiment category
Role of Neutrals
• Most work cited so far
classifies words as positive
or negative
• Results vary between 60-80%
agreement with GI as gold
standard
• Adding neutrals severely
reduces accuracy by 10-20%,
depending on part of speech
Problems
• Words without strong positive or
negative connotations are difficult to
categorize accurately
– Strength of +/- affinity can be used as a
measure
– Highest accuracy for words on extremes of
the +/- poles
• Many words have both sentiment-bearing
and neutral senses
– E.g., great typically tagged as positive,
but according to statistics in WordNet, used
neutrally 75% of occurrences
– Solve this by using sense-tagged word lists
Sense-tagged lists
• The need for sense-level
sentiment annotation has
recently attracted
considerable attention
• Development of methods to
devise sense-tagged word
lists
Sense-tagged Word Lists
• Andreevskaia and Bergler (2006)
– applied gloss-based sentiment tagging to
sense level
– extended their system by adding a word sense
disambiguation module (Senti-Sense)
• used syntactic patterns to disambiguate between
sentiment-bearing and neutral senses
– learned generalized adjective-noun patterns
for sentiment- bearing adjectives from
unambiguous data
– abstracted learned patterns to higher levels
of hypernym hierarchies using predetermined
propagation rules
– applied learned patterns to disambiguate
adjectives with multiple senses in order to
locate senses that bear sentiment
Sense-tagged Word Lists
• Esuli and Sebastiani (2007)
– Application of random walk
PageRanking algorithm to
sentiment tagging of synsets
– Takes advantage of the graphlike structure of the WordNet
hierarchy
Sense-tagged Word Lists
• Wiebe and Mihalcea (2006)
– Sense-level tagging for subjectivity
(i.e., neutrals vs. sentiment-bearing
words)
– Automatic method for sense-level
sentiment tagging based on Lin’s
(1998) similarity measure
– Acquire a list of top-ranking
distributionally similar words
– Compute WordNet-based measure of
semantic similarity for each word in
the list
Beyond Positive and
Negative
• Ide (2006)
– Bootstrapped sense-tagged word lists for semantic
categories based on
• FrameNet frames (e.g., commitment, reasoning, etc.)
• GI categories (hostile/friendly weak/strong,
submit/dominate, power/cooperation)
– Treated lexical units associated with a given
frame as a “bag of words”
– Used WordNet::Similarity to compute similarity
among senses of the words
• Relation set consisted of different pairwise
combinations of synsets, glosses, examples,
hypernyms, and hyponyms
– Based on results, compute “suresenses” (strongest
association) and retain; iterate
– Augment with “suresenses” for synsets, hypernyms,
hyponyms
• Resulting lists very high on precision
(98%), lower on recall
• Used hierarchical clustering to group
senses in a given category into
positive and negative and finer-grained
distinctions
acclaim
acclaim1
extol1
laud1
commend4
commend1
praise1
cite2
denigrate
denigrate1
deprecate2
execrate2
Results
for “judgment-communication”
belittle
condemn
charge
belittle2
disparage1
reprehend1
censure1
denounce1
remonstrate3
blame2
castigate1
condemn1
decry1
excoriate1
deprecate1
accuse2
charge2
recriminate1
accuse
ridicule
accuse1
denigrate2
deride1
ridicule1
gibe2
scoff1
mock1
scoff2
remonstrate2
Manually Annotated
Corpora
• Scarcity of manually annotated resources for
system training and evaluation
• Some work uses user-created rankings in online
product, book, or movie reviews
– ranking scale (good-bad, liked-disliked, etc.)
– easily available, fast to collect
– Drawbacks
• contain significant amount of noise
– erroneous ratings
– misspellings
– phrases in different languages
• exacerbated when researchers automatically break
reviews into sentences or snippets
Available Corpora
Corpus
Level of
annotation
Annotation
type(s)
Corpus size
Link
MPQA
Phrases
and
sentences
Private states
535 documents (10 657
sentences )
http://www.cs.pitt. edu/wiebe/mpqa
Opinion
corpus
Expressions
and
sentences
Subjectivity
and objectivity
2 sets of documents,
500 sen tences each
from WSJ Treebank
http://www.cs.pitt. edu/wiebe/pub4.html
Product
reviews
Product
features
Sentiment
features
500 reviews
http://www.cs.uic. edu/liub/FBS/FBS.html
SemEval-07
Task 14
dataset
Headlines
Sentiment
-100 to +100
polarity scale,
6 basic
emotions
2225 news headlines
http://www.cse. unt.edu/~rada/affectivetext/
Inter-annotator
Agreement
• Rate of agreement among human annotators may
reveal important insights about the task,
provide a critical baseline for system
evaluation
• Few inter-annotator agreement studies conducted
to date
• Inter-annotator agreement on popular sentiment
genres such as movie reviews, blogs, so far
unexplored
• Agreement depends on
– unit annotated (sentence or text)
– annotation type (subjectivity or sentiment)
– domain and genre
IA Studies
• Sentence subjectivity labels
(Wiebe et al., 1999)
– annotators classified sentence
as subjective if it contained
any significant expression of
subjectivity
– multiple rounds of training and
annotation instructions
adjustment
– pairwise Kappa over WSJ test set
ranged from 0.59 to 0.76
Variable Results
• Kim and Hovy (2004)
– Relatively high agreement
(κ=0.91) between two annotators
who assigned positive, negative,
and n/a labels to 100 newspaper
sentences
• Gamon and Aue (2005)
– Similar study using car reviews
produced pairwise Kappa of 0.70
- 0.80
Granularity
• Strapparava and Mihalcea (2007)
– Study suggests inter-annotator
agreement substantially lower on
fine-grained types of annotation
– Six annotators assigned sentiment
score and Ekman’s six basic emotions
scores to news headlines
– Agreement (Pearson correlation) 78.01
for sentiment
– Agreement for emotion labels ranged
from 36.07 (surprise) to 68.19
(sadness)
IA for texts
• Wiebe et al., 2001
– Two genres
• flames (hostile, inflammatory
messages)
– κ=0.78
• opinion pieces from WSJ
– κ=0.94-0.95
– Domain and genre may affect
level of inter-annotator
agreement
Sentiment Analysis
• Ultimate goal of sentiment and
subjectivity annotation is the analysis
of clauses, sentences, and texts
• Resources :
– Lists
– Annotated (validated) corpora
• Subjectivity analysis uses MPQA corpus
– reliable gold standard for training and evaluation of
newspaper texts
• Sentiment analysis must rely on small manually
annotated test sets
– created ad hoc
– seldom made publicly available
– too small for machine learning methods
Sentiment Analysis
• Single text often includes both
positive and negative sentences
• Sentences and clauses regarded as the
most natural units for
sentiment/subjectivity annotation
– Sentiment usually more homogenous in
sentence than whole text
– But harder to identify due to a limited
number of clues
– Work on improving system performance by
performing analysis simultaneously at
different levels
• Sentence-level analysis has high precision
• Text-level analysis has high recall
Words vs. Other
Features
• Words/unigrams provide good accuracy in
sentence-level sentiment tagging
– Yu and Hatzivassiloglou (2003)
• no significant improvement when bigrams or
trigrams added to feature set
– Kim and Hovy (2004)
• presence of sentiment-bearing words works
better than more sophisticated scoring methods
– Riloff et al. (2006)
• similar results for unigrams vs. unigrams plus
bigrams, extraction patterns
• subsumption hierarchy and feature selection
brought less than 1% improved accuracy
Other Features vs. Words
• Some studies show gains with use of
features
– Wilson et al. (2005), Andreevskaia et al. (2007)
• syntactic properties of the phrase (role in sentence
structure, presence of modifiers and valence
shifters) yield statistically significant increases
in system performance sentiment and subjectivity
analysis
– Gains in subjectivity analysis:
• presence of complex adjectival phrases (Bethard et
al., 2004)
• similarity scores (Yu and Hatzivassiloglou, 2003
• position in the paragraph (Wiebe et al., 1999)
– Gains in sentiment analysis
• syntactic patterns and negation (Hurst and Nigam,
2004; Mulder et al., 2004; Andreevskaia et al.,
2007)
• knowledge about the opinion holder (Kim and Hovy,
2004)
• target of the sentiment (Hu and Liu, 2004)
Multiple Features
• Some experiments suggest improvement
gained when multiple feature sets
combined
– Hatzivassiloglou and Wiebe, 2000
• combination of lists of adjectives tagged with
dynamic, polarity, and gradability labels best
predictor of sentence subjectivity
– Riloff et al. (2003)
• accuracy of subjectivity tagging results
improved with addition of each new feature (25
in all)
– Gamon and Aue (2005)
• sentiment tagging using larger number of
features increased average precision and recall
SemEval-2007 Affective Text
Task
• Opportunity to compare systems for sentiment
analysis run on the same dataset of 2000
manually annotated news headlines
• Machine learning methods had highest recall at
the cost of low precision
– naïve Bayes-based CLaC-NB (Andreevskaia and
Bergler, 2007)
– word-space model-based SICS (Sahlgren et al.,
2007)
• Knowledge-based unsupervised approaches had
highest precision, but low recall because few
sentiment clues per headline
– CLaC (Andreevskaia and Bergler, 2007)
– UPAR7 (Chaumartin, 2007)
General Conclusion (so far)
• Subjectivity classification
– statistical approaches that use naïve Bayes
or SVM significantly outperform nonstatistical techniques
• Binary (positive-negative) sentiment
classification
– non-statistical methods that rely on
presence of sentiment markers in a sentence,
or on strength of sentiment associated with
these markers, yield as good or better
accuracy than statistical approaches
• BUT: both methods show some strengths
in both tasks
• Need LOTS more research
Text-level Analysis
• Features
– Choice of features used by a
sentiment annotation system is a
critical factor
– Wide range of features used:
•
•
•
•
•
•
lists of words
lemmas or unigrams
Bigrams
higher order n-grams
part-of-speech
syntactic properties of surrounding
context
• etc.
Use of Features
• Use of multiple models can
improve the performance of
both sentiment and
subjectivity classifiers
– Use of different features within
the same classifier or in a
community of several classifiers
improves system performance
Features
• Studies show improvement when using
ngrams vs. unigrams (Dave et al., 2003;
Cui et al., 2006; Aue and Gamon, 2005;
Wiebe et al., 2001; Riloff et al.,
2006)
• Improvement when n-gram models / word
lists augmented with
– context information (Wiebe et al., 2001b;
Andreevskaia et al., 2007)
– features associated with syntactic structure
(Gamon, 2004)
– combination of words annotated for semantic
categories related to sentiment or
subjectivity (Whitelaw et al., 2005;
Fletcher and Patrick, 2005; Mullen and
Collier, 2004)
Feature Selection
• Costly or computationally unfeasible to use all
features
– Aue and Gamon (2005)
• n-grams selected based on Expectation Maximization
algorithm
– Riloff et al. (2006)
• Use a subsumption hierarchy for n-grams (unigrams
and bigrams) and extraction patterns
– if a feature’s words and dependencies are a superset of
a more general ancestor in the hierarchy, discard
– only features with higher information gain were allowed
to subsume less informative ones
• for both subjective/objective and positive/negative
text-level classifications, a combined use of
subsumption and traditional feature selection
improves performance of subjective/objective and
positive/negative text-level classification
Part of Speech
• Subjectivity
– Adjectives best predictors of
subjectivity (Hatzivassiloglou and
Wiebe, 2000)
– Modals, pronouns, adverbs, cardinal
numbers also used as subjectivity
clues (Wiebe et al., 1999; Bruce and
Wiebe, 2000)
• Sentiment
– Combined use of words from all partsof-speech produced more accurate tags
(Blair et al., 2004; Salvetti et al.,
2004)
Feature Generation
• Most sentiment classifiers use standard
machine learning techniques to learn
and select features from labeled
corpora
– Works well when large labeled corpora
available for training and validation (e.g.,
movie reviews)
– Falls short when training data is
• scarce
• different domain or topic
• different time period
• Led to increased interest in unsupervised and
semi-supervised approaches to feature
generation
Some Promising
Research
• Systems trained on a small number of labeled
examples and large quantities of unlabelled indomain data perform relatively well (Aue and
Gamon, 2005)
• Structural correspondence learning applied to
small number of labeled examples sufficient to
adapt to new domain (Blitzer et al., 2007)
• So far performance of these methods inferior to
supervised approaches and knowledge-based
methods
• Availability of word lists and clues makes
knowledge-based approaches an attractive
alternative to supervised machine learning when
labeled data is scarce
Domain Effects
• Variety of domains used in
sentiment analysis
– movie, music, book and other
entertainment reviews
– product reviews
– Blogs
– dream corpus
– Etc.
• Choice of domain can have a major
impact on results
Movie Reviews
• Popular domain in sentiment research
• Positive and negative words/expressions
do not necessarily convey the opinion
holder’s attitude
– E.g., “evil” used in movie reviews when
referring to characters or plot, does not
convey sentiment toward the movie itself
(Turney, 2002)
• Simple counting of positive and
negative clues in movie review texts
insufficient
• Clues acquired from out-of-domain
sources often fail
Product Reviews
• Sentiment towards whole
product sum of the sentiment
towards its parts,
components, and attributes
(Turney, 2002)
• General word lists perform
better on product reviews
(Turney, 2002; Kennedy and
Inkpen, 2006)
Attitude Influence
• Texts with positive sentiment easier to
classify than negative ones
– Kennedy and Inkpen, 2006; Hurst and Nigam,
2004; Dave et al., 2003; Koppel and Schler,
2006; Chaovalit and Zhou, 2005
• Possible explanations:
– positive documents more uniform (Dave et
al., 2003)
– positive clues have higher discriminant
value (Koppel and Schler, 2006)
– negative texts characterized by extensive
use of negations and other valence shifters
that reverse the sentiment conveyed by
individual words (e.g., not bad) (Pang and
Lee, 2004)
Attitude Influence
• Improvement in accuracy when
valence shifters taken into
account (Kennedy and Inkpen, 2006;
Andreevskaia et al., 2007)
– but negative impact reported when
negation included in feature set
(Dave et al., 2003)
• Use of balanced evaluation sets
with equal number of positive and
negative documents has become a
standard in sentiment research
•
Classification
Algorithms
Wide variety of classification approaches used:
– simple keyword counting methods, with or without
scoring
– rule-based methods
– content analytical methods (statistical)
– SVM, naïve Bayes and other statistical classifiers
used alone, sequentially, or as a community
• Comparison of results does not provide a
definite answer as to which of these methods is
the best for sentiment or subjectivity tagging
– choice of features and training domain have a more
impact on accuracy than choice of classification
algorithm
– comparison of performance of systems evaluated on
different domains or different feature sets not
conclusive
Summary
• Sentiment and subjectivity
analysis has evolved into a
strong research stream in NLP
• State-of-the-art systems can
reach up to 90% accuracy on
certain domains
– But need a generally applicable
method
Research Directions
• Development of semi-supervised machinelearning approaches that will maximize
the usefulness of the available
resources and ensure domain adaptation
with limited in-domain data
• Creation of reliable and extensive
resources such as lists of words and
expressions, syntactic patterns,
combinatorial rules, and annotated
corpora
• Creation of uniform ways to denote and
represent sentiment and subjectivity
annotation
Bibliography
•
•
•
•
•
Andreevskaia, A. and S. Bergler: 2007, ‘CLaC and CLaC-NB: Knowledgebased and Corpus-based Approaches to Sentiment Tagging’. In: 4th
International Workshop on Semantic Evaluations (SemEval 2007).
Prague, Czech Republic.
Andreevskaia, A., S. Bergler, and M. Urseanu: 2007, ‘All Blogs are
not made Equal’. In: International Conference on Weblogs and Social
Media (ICWSM-2007). Boulder, Colorado.
Aue, A. and M. Gamon: 2005, ‘Customizing Sentiment Classifiers to
New Domains: a Case Study’. In: RANLP-05, the International
Conference on Recent Advances in Natural Language Processing.
Borovets, Bulgaria.
Bethard, S., H. Yu, A. Thornton, V. Hatzivassiloglou, and D.
Jurafsky: 2004, ‘ Automatic Extraction of Opinion Propositions and
their Holders’. In: Exploring Attitude and Affect in Text: theories
and application (AAAI-EAAT 2004) .Stanford University.
Blair, L., A. Jaharria, S. Lewis, T. Oda, C. Reichenbach, J.
Rueppel, and F. Salvetti: 2004, ‘Impact of Lexical Filtering on
Semantic Orientation’. In: Exploring Attitude and Affect in Text:
theories and application (AAAI-EAAT 2004). Stanford University.
Bibliography
•
•
•
•
•
Breck, E., Y. Choi, and C. Cardie: 2007, ‘Identifying expressions of
opinion in context’. In: Proceedings of the Twentieth International
Joint Conference on Artificial Intelligence (IJCAI-2007). Hyderebad,
India.
Bruce, R. and J. Wiebe: 2000, ‘Recognizing subjectivity: A case
study of manual processing’. Natural Language Engineering 5(2), 187–
205.
Chaovalit, P. and L. Zhou: 2005, ‘Movie Review Mining: a Comparison
between Supervised and Unsupervised Classification Approaches’. In:
Proceedings of HICSS-05, the 38th Hawaii International Conference on
System Sciences.
Chaumartin, F.-R.: 2007, ‘ UPAR7: A Knowledge-based System for
Headline Sentiment Tagging’. In: 4th International Workshop on
Semantic Evaluations (SemEval 2007). Prague, Czech Republic.
Cui, H., V. Mittal, and M. Datar: 2006, ‘Comparative Experiments on
Sentiment Classification for Online Product Reviews’. In:
Proceedings of the Twenty-First National Conference on Artificial
Intelligence (AAAI-2006). Boston, MA.
Bibliography
•
•
•
•
•
•
Das, S. R. and M. Y. Chen: 2001, ‘Yahoo! For Amazon: Sentiment
extraction from small talk on the Web’. In: Asia Pacific Finance
Association Annual Conference (APFA01).
Dave, K., S. Lawrence, and D. M. Pennock: 2003, ‘Mining the Peanut
gallery: opinion extraction and semantic classification of product
reviews’. In: (WWW03). Budapest, Hungary, pp. 519–528.
Drezde, M., J. Blitzer, and F. Pereira: 2007, ‘Biographies,
Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment
Classification’. In: 45th Annual Meeting of the Association for
Computational Linguistics (ACL 2007). Prague, Czech Republic.
Dunning, T.: 1993, ‘Accurate Methods for the Statistics of Surprise
and Coincidence’. Computational Linguistics 19, 61–74.
Ekman, P.: 1993, ‘Facial expression of emotion.’. American
Psychologist 48, 384–392.
Fletcher, J. and J. Patrick: 2005, ‘Evaluating the Utility of
Appraisal Hierarchies as a Method for Sentiment Classification’. In:
Proceedings of Australian Language Technology Workshop 2005. Sydney,
Australia, pp. 134–142.
Bibliography
•
•
•
•
•
•
Gamon, M.: 2004, ‘Sentiment classification on customer feedback
data: noisy data, large feature vectors, and the role of linguistic
analysis’. In: Proceeding of COLING-04, the 20th International
Conference on Computational Linguistics. Geneva, CH, pp. 841–847.
Gamon, M. and A. Aue: 2005, ‘Automatic identification of sentiment
vocabulary: exploiting low association with known sentiment terms’.
In: Proceedings of the ACL-05 Workshop on Feature Engineering for
Machine Learning in Natural Language Processing. Ann Arbor, US.
Goldberg, A. and J. Zhu: 2006, ‘Seeing stars when there aren’t many
stars: Graph-based semi-supervised learning for sentiment
categorization’. In: Proceedings of the HLT-NAACL 2006 Workshop on
Textgraphs: Graph-based Algorithms for Natural Language Processing.
Boston, MA.
Hatzivassiloglou, V. and J. Wiebe: 2000, ‘ Effects of Adjective
Orientation and Gradability on Sentence Subjectivity’. In: 18th
International Conference on Computational Linguistics (COLING-2000).
Hu, M. and B. Liu: 2004, ‘Mining and summarizing customer reviews’.
In: KDD-04. pp. 168–177.
Hurst, M. and K. Nigam: 2004, ‘Retrieving topical sentiments from
Online document collection’. In: Exploring Attitude and Affect in
Text: theories and application (AAAI-EAAT 2004). Stanford
University.
Bibliography
•
•
•
•
•
•
Kennedy, A. and D. Inkpen: 2006, ‘Sentiment Classification of Movie
Reviews Using Contextual Valence Shifters’. Computational
Intelligence 22(2), 110–125.
Kim, S.-M. and E. Hovy: 2004, ‘Determining the Sentiment of
Opinions’. In: Proceedings COLING-04, the Conference on
Computational Linguistics. Geneva, CH, pp. 1367–1373.
Kim, S.-M. and E. Hovy: 2005a, ‘Automatic Detection of Opinion
Bearing Words and Sentences’. In: Companion Volume to the
Proceedings of IJCNLP-05, the Second International Joint Conference
on Natural Language Processing. Jeju Island, KR, pp. 61–66.
Kim, S.-M. and E. Hovy: 2005b, ‘Identifying Opinion Holders for
Question Answering in Opinion Texts’. In: Proceedings of AAAI-05
Workshop on Question Answering in Restricted Domains. Pittsburgh,
US.
Kim, S.-M. and E. Hovy: 2006, ‘Extracting Opinions, Opinion Holders,
and Topics Expressed in Online News Media Text’. In: Proceedings of
the ACL/COLING Workshop on Sentiment and Subjectivity in Text.
Sydney, Australia.
Koppel, M. and J. Schler: 2006, ‘The importance of neutral examples
for learning sentiment’. Computational Intelligence 22(2), 100–116.
Bibliography
•
•
•
•
•
Mao, Y. and G. Lebanon: 2006, ‘Sequential Models for Sentiment
Prediction’. In: Proceedings of the ICML workshop on Learning in
Structured Output Spaces.
McDonald, R., K. Hannan, T. Nevon, M. Wells, and J. Reynar: 2007,
‘Structured Models for Fine-to-Coarse Sentiment Analysis’. In:
Proceedings of the 45th Annual Meeting of the Association for
Computational Linguistics (ACL-2007). Prague, Czech Republic.
Mulder, M., A. Nijholt, M. den Uyl, and P. Terpstra: 2004, ‘A
Lexical Grammatical Implementation of Affect’. In: Proceedings of
TSD-04, the 7th International Conference Text, Speech and Dialogue,
Vol. 3206 of Lecture Notes in Computer Science. Brno, CZ, pp. 171–
178.
Mullen, T. and N. Collier: 2004, ‘Sentiment analysis using support
vector machines with diverse information sources’. In: Proceedings
of EMNLP-04, 9th Conference on Empirical Methods in Natural Language
Processing. Barcelona, Spain.
Pang, B. and L. Lee: 2004, ‘A sentiment education: Sentiment
analysis using subjectivity summarization based on minimum cuts’.
In: ACL. pp. 271–278. arXiv:cs.CL/040958.
Bibliography
•
•
•
•
•
Pang, B., L. Lee, and S. Vaithyanathan: 2002, ‘Thumbs up? Sentiment
classification using machine learning techniques’. In: Conference on
Empirical Methods in Natural Language Processing (EMNLP-2002). pp.
79–86.
Read, J.: 2005, ‘Using emoticons to reduce dependency in machine
learning techniques for sentiment classification’. In: Proceedings
of the ACL-2005 Student Research Workshop. Ann Arbor, MI.
Riloff, E., S. Patwardhan, and J. Wiebe: 2006, ‘Feature Subsumption
for Opinion Analysis’. In: Proceedings of EMNLP-06, the Conference
on Empirical Methods in Natural Language Processing. Sydney, AUS,
pp. 440–448.
Riloff, E., J. Wiebe, and T. Wilson: 2003, ‘Learning subjective
nouns using extraction pattern bootstrapping’. In: W. Daelemans and
M. Osborne (eds.): Proceedings of CONLL-03, 7th Conference on
Natural Language Learning. Edmonton, CA, pp. 25–32.
Sahlgren, M., J. Karlgren, and G. Eriksson: 2007, ‘ SICS: Valence
Annotation Based on Seeds in Word Space’. In: 4th International
Workshop on Semantic Evaluations (SemEval 2007). Prague, Czech
Republic.
Bibliography
Salvetti, F., S. Lewis, and C. Reichenbach: 2004, ‘Impact of lexical
filtering on overall polarity identification’. In: Exploring
Attitude and Affect in Text: theories and application (AAAI-EAAT
2004). Stanford University.
• Snyder, B. and R. Barzilay: 2007, ‘Multiple Aspect Ranking using the
Good Grief Algorithm’. In: Proceedings of NAACL-2007. Washington,
DC.
• Stoyanov, V., C. Cardie, D. Litman, and J. Wiebe: 2004, ‘Evaluating
an Opinion Annotation Scheme Using a New Multi-Perspective Question
and Answer Corpus’. In: J. G. Shanahan, J. Wiebe, and Y. Qu (eds.):
Proceedings of the AAAI Spring Symposium on Exploring Attitude and
Affect in Text: Theories and Applications. Stanford, US.
• Strapparava, C. and R. Mihalcea: 2007, ‘SemEval-2007 Task 14:
Affective Text’. In: 4th International Workshop on Semantic
Evaluations (SemEval 2007). Prague, Czech Republic.
• Turney, P.: 2002, ‘Thumbs up or thumbs down? Semantic orientation
applied to un-supervised classification of reviews’. In: 40th Annual
Meeting of the Association of Computational Linguistics (ACL02)).
pp. 417–424.
Bibliography
•
•
•
•
•
Whitelaw, C., N. Garg, and S. Argamon: 2005, ‘Using Appraisal
Taxonomies for Sentiment Analysis’. In: Proceedings of CIKM-05, the
ACM SIGIR Conference on Information and Knowledge Management.
Bremen, Germany.
Wiebe, J.: 2002, ‘Instructions for Annotating Opinions in Newspaper
Artic’. Technical Report TR-01-101, University of Pittsburgh,
Department of Computer Science, Pittsburgh, PA.
Wiebe, J., E. Breck, C. Buckley, C. Cardie, P. Davis, B. Fraser, D.
Litman, D. Pierce, E. Riloff, T. Wilson, D. Day, and M. Maybury:
2003, ‘Recognizing and Organizing Opinions Expressed in World
Press’. In: Proceedings of the AA AI Spring Symposium on New
Directions in Question Answering.
Wiebe, J., R. Bruce, M. Bell, M. Martin, and T. Wilson: 2001a, ‘A
corpus study of Evaluative and Speculative Language’. In:
Proceedings of the 2nd ACL SIGDial Workshop on Discourse and
Dialogue). Aalberg, Denmark.
Wiebe, J. and E. Riloff: 2005, ‘Creating Subjective and Objective
Sentence Classifiers from Unannotated Texts’. In: Proceeding of
CICLing-05, International Conference on Intelligent Text Processing
and Computational Linguistics, Vol. 3406 of Lecture Notes in
Computer Science. Mexico City, MX, pp. 475–486.
Bibliography
•
•
•
•
•
•
Wiebe, J., T. Wilson, and M. Bell: 2001b, ‘Identifying Collocations
for Recognizing Opinions’. In: Proceedings of the ACL Workshop on
collocation. Toulouse, France.
Wiebe, J., T. Wilson, and C. Cardie: 2005, ‘Annotating expressions
of opinions and emotions in language’. Language Resources and
Evaluation 39(2–3), 165–210.
Wiebe, J. M., R. F. Bruce, and T. P. OHara: 1999, ‘ Development and
Use of a Gold-Standard Data Set for Subjectivity Classifications ’.
In: 37th Annual Meeting of the Association for Computational
Linguistics (ACL-99). pp. 246–253.
Wilson, T. and J. Wiebe: 2003, ‘Annotating Opinions in the World
Press’. In: 4th SIGdial Workshop on Discourse and Dialogue (SIGdial03).
Wilson, T., J. Wiebe, and R. Hwa: 2006, ‘Recognizing Strong and Weak
Opinion Clauses’. Computational Intelligence 2(22), 73–99.
Yu, H. and V. Hatzivassiloglou: 2003, ‘Towards Answering Opinion
Questions: Separating Facts from Opinions and Identifying the
Polarity of Opinion Sentences’. In: M. Collins and M. Steedman
(eds.): Proceedings of EMNLP-03, 8th Conference on Empirical Methods
in Natural Language Processing. Sapporo, Japan, pp. 129–136.