2017.06.21 - Opinion Mining - Research group Business Informatics

Central University of Las Villas, Cuba
Artificial intelligence Lab
Computer Science Department
Opinion Mining
Prof. Leticia Arco García
[email protected]
Motivation
 Someone who wants to buy a car
 Looks for comments and reviews
 Someone who just bought a car
 Comments on it
 Writes about his experience
 Car manufacturer
 Gets feedback from customers
 Improve their products
 Adjust marketing strategies
Opinions

Opinions are central to almost all human activities and are key
influencers of our behaviours

Our beliefs and perceptions of reality, and the choices we make,
are, to a considerable degree, conditioned upon how others see
and evaluate the world
When we need to make a decision
we often seek out the opinions of others
This is not only true for individuals
but also true for organizations
Social media on the Web
With the explosive growth of social media on the Web, individuals and
organizations are increasingly using the content in these media for
decision making
4
Contents
 Origin and definition
 Different levels of analysis
 Opinions: definition, types and main problems
 Sentiment analysis tasks
 Polarity detection: two approaches
 Other sentiment analysis proposals
 Lexical resources and available datasets
 Applications in Business Informatics
 Our results
 Challenges: SemEval-2017 and TASS 2017
5
Origin (1/2)
 Some earlier work on interpretation of metaphors, sentiment
adjectives, subjectivity, view points and affects (1990-1999)
 Learning subjective adjectives from corpora (Wiebe, 2000)
 Yahoo! for Amazon: Extracting market sentiment from stock
message boards (Dans and Chen, 2001)
 An operational system for detecting and tracking opinions in
on-line discussion (Tong, 2001)
Origin (2/2)
Sentiment analysis
 Sentiment analysis: Capturing favourability using natural
language processing (Nasukawa and Yi, 2003)
Opinion mining
 Mining the peanut gallery: Opinion extraction and semantic
classification of product reviews (Dave et. al., 2003)
Definition
 Sentiment analysis is the field of study that analyses






people’s opinions
sentiments
evaluations
appraisals
attitudes
emotions
 towards entities such as







products
services
organizations
individuals
issues
events
topics
 their attributes
Review
mining
Sentiment
analysis
Subjectivity
analysis
Sentiment
mining
Affect
analysis
Emotion
analysis
Opinion
extraction
Opinion
mining
Different levels of analysis (1/2)
 Document level
 Classifies whether a whole opinion document expresses a positive or
negative sentiment
 Assumption: Each document expresses opinions on a single entity
 Sentence level
 Determines whether each sentence expresses a positive, negative or
neutral opinion
 Closely related to subjectivity classification
“The iPhone’s call quality is good, but its battery life is short.”
Both levels do not discover what exactly people liked and did not like
Different levels of analysis (2/2)
 Entity and aspect level
 Performs finer-grained analysis
 Directly looks at the opinion itself
 Goal: discover sentiments on entities and/or their aspects
“The iPhone’s call quality is good, but its battery life is short.”
Entity: iPhone
Aspects: call quality and battery life
Sentiment on iPhone’s call quality: positive
Sentiment on its battery life: negative
Opinion
 The meaning of opinion itself is still very broad
 Sentiment analysis mainly focuses on opinions which express
or imply positive or negative sentiments
1, 2, 3 and 4 positive
5 negative
Date
Opinion source
Opinion holders
Sentiment orientations
Opinion polarities
Topic
Opinion definition
 An opinion is a quintuple (ei, aij, sijkl, hk, tl) where:
 ei is the name of an entity
 aij is an aspect of ei
 sijkl is the sentiment on aspect aij of entity ei
 hk is the opinion holder
 tl is the time when the opinion is expressed by hk
 The sentiment sijkl is positive, negative, or neutral, or
expressed with different strength/intensity level, e.g., 1 to 5
Types of opinions (1/2)
 Regular opinions
 Express a sentiment only on a particular entity or an aspect of the entity
 Direct opinion
“Belgian chocolates taste very good.”
 Indirect opinion
“After injection of the drug, my joints felt worse.”
 Comparative opinions
 Compare multiple entities based on some of their shared aspects
“Belgian beers taste much better than Cuban beers.”
Types of opinions (2/2)
 Explicit opinion
 Is a subjective statement that gives a regular or comparative opinion
“UHASSELT is a very good university.”
 Implicit opinion
 Is an objective statement that implies a regular or comparative opinion
“The battery life of Nokia phones is longer than Samsung
phones.”
Explicit opinions are easier to detect and to
classify than implicit opinions
Sentiment analysis is a NLP problem
 It touches every aspect of NLP
 Co-reference resolution
 Negation handling
 Word sense disambiguation
 Sentiment analysis is a highly restricted NLP problem
 It does not need to fully understand the semantics of each
sentence or document
 It only needs to understand some aspects of it
 Positive or negative sentiments
 Their target entities
 Their topics
Opinion Mining is more difficult than Text Mining
 Informal language
 Abbreviations
 Emoticons
 Spelling and typographical errors
 Ironic and sarcastic language
 Language knowledge level
 Cultural level
These characteristics impose a greater difficulty on the
opinion mining, regarding other text mining tasks
Sentiment analysis tasks
Objective of sentiment analysis: Given an opinion document,
discover all opinion quintuples
1. Entity extraction and categorization
2. Aspect extraction and categorization
3. Opinion holder extraction and categorization
4. Time extraction and standardization
5. Aspect sentiment classification
6. Opinion quintuple generation
Polarity detection: two approaches
 Semantic approaches
 Characterized by the use of dictionaries of words (lexicons) with semantic
orientation of polarity or opinion
 Computational learning techniques
 Consist on training a classifier using any supervised learning algorithm
from a collection of annotated texts
Words expressing feeling or opinion
Positive opinion: good, wonderful, amazing, …
Negative opinion: bad, poor, terrible, …
Sentiment lexicon or opinion lexicon
(sentiment words, opinion words, polar words, opinion-bearing words)
Base type
Comparative type
Approaches to compile sentiment words
 Manual approach
 Labour intensive and time consuming
 Useful for final check in automated approaches
 Dictionary-based approach
 Few seed sentiment words to bootstrap based on the synonym and
antonym structure of a dictionary
 Corpus-based approach
1. Given a seed list of known sentiment words, discover other sentiment
words and their orientations from a domain corpus
2. Adapt a general-purpose sentiment lexicon to a new one using a domain
corpus for sentiment analysis applications in the domain
Sentiment lexicon
Although sentiment words and phrases are
important for sentiment analysis, only using
them is far from sufficient
Sentiment lexicon is necessary but not
sufficient for sentiment analysis
Some problems of feeling words
 They may have opposite
orientations in different
application domains
 A sentence containing sentiment
words may not express any
sentiment
 Sarcastic sentences with or
without sentiment words are
hard to deal with
 Many sentences without feeling
words can also imply opinions
“It is a large dictionary, covering
thousands of words.”
“He has put on weight, and is
“Can
you tell
me which camera is
now quite
large.”
good?”
“He likes to talk large, but I think
“If
can find a good camera in the
he Iexaggerates.”
shop, I'll buy it.”
“I HATE to admit it but, I LOVE
“He
killed the
ant before it could
admitting
things.”
bite him.”
“I liiikeee winter, summer does
not arrive yet :-(“
“What a great car! It stopped
working in two days.”
“This washer uses a lot of water.”
Sentiment classification using supervised learning (1/3)
 Two-class classification problem: positive and negative
 Training and testing data used are normally product reviews
 A review with 4 or 5 stars is considered a positive review
 A review with 1 to 2 stars is considered a negative review
 First approaches:
 Naïve Bayes classification
 Support Vector Machines
Sentiment classification using supervised learning (2/3)
Like other supervised machine learning applications, the key for
sentiment classification is the engineering of a set of effective features

Terms and their frequency

Part of speech

Sentiment words and phrases

Rules of opinions

Sentiment shifters

Syntactic dependency
Sentiment classification using supervised learning (3/3)
Apart from classification of positive and negative sentiments,
researchers also studied the problem of predicting the rating
scores (e.g., 1–5 stars) of reviews
Regression problem
Subjectivity classification
 Objective sentences
 Express factual information from sentences
 Subjective sentences
 Express subjective views and opinions
Is subjectivity equivalent to sentiment?
“I think that he went home.”
“The phone broke in two days.”
Emotion
 Emotions are our subjective feelings and thoughts
 Six primary emotions: love, joy, surprise, anger, sadness and
fear
 Opinions that we study in sentiment analysis are mostly
evaluations
 Rational evaluations are from rational reasoning, tangible beliefs, and
utilitarian attitudes
 Emotional evaluations are from non-tangible and emotional responses
to entities which go deep into people’s state of mind
 Five sentiment ratings
 emotional negative (-2), rational negative (-1), neutral (0), rational
positive (+1), and emotional positive (+2)
Aspect-based sentiment analysis
 Such methods are typically unsupervised
 Sentiment lexicon
 Composite expressions
 Rules of opinions
 Sentence parse tree
 Sentiment shifters
 But-clauses
 Aggregate opinions
Aspect extraction approaches
 Extraction based on frequent nouns and noun phrases
 Extraction by exploiting opinion and target relations
 Extraction using supervised learning
 Extraction using topic modelling
Semantic classification and deep learning
Grouping aspects into categories
 Aspect expressions need to be grouped into synonymous
aspect categories
 Each category represents a unique aspect
 Same aspect for phones: “call quality” and “voice quality”
Many aspect expressions
are multi-word phrases,
cannotaspect
be easilyis
 Grouping such aspect expressions fromwhich
the same
handled with dictionaries
critical for opinion analysis
 WordNet and other thesaurus
 “movie” and “picture” are synonyms in movie reviews
 “picture” is more likely to be synonymous to “photo” while
“movie” to “video” in camera reviews
Opinion summarization (1/3)
Different entity names
 Aspect-based opinion summary
Different aspect names
Opinion summarization (2/3)
 Visualization of aspect-based summary of opinions on a digital
camera
Opinion summarization (3/3)
 Visualization of aspect-based summaries of opinions
Opinion spammers
A key feature of social media is that it enables anyone from anywhere in the
world to freely express his/her views and opinions without disclosing
his/her true identify and without the fear of undesirable consequences
Opinion spammers
Friends and family
Competitors
Company employees
Genuine customers
Businesses that provide fake
review writing services
Some businesses give discounts and even full refunds to some of their customers on
the condition that the customers write positive reviews for them
Agencies and political organizations may employ people to post messages to
secretly influence social media conversations and to spread lies and disinformation
Opinion spammers vs opinion spam detection
• Review content: linguistic features
• Meta-data about the review: user-id, star rating, time, host IP address, …
• Product information
Opinion spam detection
Supervised
There is no labelled training
data for learning
Unsupervised
• Spam detection based on atypical
behaviours
Exploit duplicate reviews
• Spam detection using review graph
Create features
Group spam detection:
Frequent pattern mining
Cross-domain sentiment classification

A classifier trained using opinion documents from one domain often
performs poorly on test data from another domain

Words and even language constructs used in different domains for
expressing opinions can be quite different • Learn as humans do

The same word in one domain may mean positive but in another
• Retain learned knowledge
domain may mean negative
from previous tasks and
use it to help future
learning
Domain adaptation or transfer learning is needed
A small amount of labelled training
data for the new domain
• Is a continuous learning
process where the learner
has performed
a sequence
No labelled
data for the
new
of learning
domaintasks
Lifelong machine learning
Cross-language sentiment classification
 Perform sentiment classification of opinion documents in
multiple languages
 Researchers from different countries want to build sentiment
analysis systems in their own languages.
 Companies want to know and compare consumer opinions about
their products and services in different countries
Co-training methods
Lexical resources
Opinion search and retrieval
 Find public opinions about a particular entity or an aspect of
the entity
 Find customer opinions about a digital camera
 Find opinions of a person or organization (i.e., opinion holder)
about a particular entity or an aspect of the entity (or topic)
 Find Charles Michel’s opinion about terrorism
Lexical resources
 WordNet Affect
 SentiWordNet
 General Inquirer
WordNet Affect

WordNet-Affect is an extension of WordNet Domains, including a
subset of synsets suitable to represent affective concepts correlated
with affective words

Affective labels (a-labes) are assigned to a number of WordNet
synsets
WordNet Affect: Terms and affective categories
Some terms related to "university" through their emotional
categories
SentiWordNet
 SentiWordNet is a lexical resource for opinion mining
 SentiWordNet assigns to each synset of WordNet three
sentiment scores:
 Positivity
 Negativity
 Objectivity
 Generating SentiWordNet
1. A weak-supervision, semi-supervised learning step
2. A random-walk step
General Inquirer
 Harvard categories:
 Positive, Negative, Strong, Week, Active, Passive, …
 Pleasure, Pain, Feel, Arousal, Virtue, Emotion, …
 New categories based social cognition
 Lasswell value dictionary categories
Some public available datasets
 Stanford large movie dataset
 Movie
 TripAdvisor
 TBOD
 ISEAR
 DUC data
 Spinn3r dataset
 HASH
 EMOT
 OpinRank dataset
Opinion mining and enterprises
 Enterprises are open and flexible in the use of technological
tools to “sense” customers and market
 Acquiring information in real-time allows the company to be
agile and to develop ”Sense and Response” capabilities
 An agile enterprise respond immediately to any internal or
external event as customer demand or customer opinions
 Knowing what the customer thinks of a given product/service
helps top management to introduce improvements in
processes and products
 Customer opinions represent a potential of knowledge to be
consider for the acquisition of competitive advantages
Opinions are very important for decision making
Gretzel and Yoo (2008) demonstrate that 97.7% of travel
booking decisions are made after consulting other travellers’
opinions, of which 77.9% involve the use of customer reviews
as a source of information helping to make a better decision
Gretzel, U. & Yoo, K. H. (2008) Use and Impact of Online Travel Reviews
Information and Communication Technologies in Tourism. Innsbruck, Austria.
How can a sentiment analysis tool help my brand?
 Better understand the motivations behind sentiment
 Learn from social posts, news, reviews, and more
 Benchmark against competitors
 Track purchase intent
 Evaluate campaign impact
 Analyse product launch response
Some sentiment analysis tools
 Opinion Crawl
 Meaning cloud
 Trackur
 SAS
 Opentext
 Statsoft
 NetOwl Extractor
 Meltwater
Cloud‐based Event‐processing Architecture for Opinion
Mining (1/2)
 Smart distributed architecture for opinion mining on internetbased content that answers key challenges:
 Integrating heterogeneous data sources
 Adapting to events through dynamic system configuration
 A novel approach of semantic complex event processing in a
cloud environment capturing different levels of information:
 Event data
 Content from various heterogeneous sources
 Distributed sources
 Dynamic co-reference resolution
Cloud‐based Event‐processing Architecture for Opinion
Mining (2/2)
1. Topic modelling and sentiment analysis
2. Deep linguistic and interlinking analysis
3. Transfer learning and active learning of opinions
4. Cloud computing and event processing
Enterprise information fusion for real-time business
intelligence (1/2)
Correlate the
external events
in real-time
with known
facts about the
internal
operations and
transactions of
the enterprise
and its
ecosystem
Enterprise information fusion for real-time business
intelligence (2/2)
News event detection from Twitter
Identifying customer preferences about tourism products
using an aspect-based opinion mining approach
A novel application mining for competitive intelligence
A new method to extract opinion patterns from customer reviews
and its application to evaluate resources or internal factors in an
enterprise
1.
Opinion gathering
2.
Text pre-processing
3.
Factor and polarity detection
4.
Internal factor evaluation
Customer voice sensor
Call centre is an important intermediary
betweenfor
enterprise
and customers
A comprehensive opinion mining system
call centre
• It helps customers to solve the
conversation
problems
• It allows the enterprise to deeply
analyse the customer's voice and
make a distinct market positioning
Mobile application for customers’ reviews opinion mining
The Power of Text-mining in Business Process Management
(1/2)
The Power of Text-mining in Business Process Management
(2/2)
PosNeg Opinion
Opinions
Identify terms
Disambiguate lexically each term
Obtain all meanings of each term
Classify each term in positive or negative
Evaluate the opinion
Improving SentiWordNet 3.0
84342 terms
Preprocessing stage: Split terms considering if they have
polarity values assigned or not
5037
79305 terms
Stage 1: Assign polarity values considering the synonyms
of terms without assigned polarity values
51027
28278 terms
Stage 2: Assign inverse polarity values considering the
antonyms of terms without assigned polarity values
5678
22600 terms
Stage 3: Assign polarity values considering the synonyms
of terms with assigned polarity values
5770
16830 terms
Stage 4: Assign inverse polarity values considering the
antonyms of terms with assigned polarity values
15291 terms without assigned polarity values
1539
69051 terms with assigned polarity values
SpanishSentiWordNet
agresor n 09195176 09158637 09848308 attacker assailant aggressor
assaulter aggressor robber
Intralinguistic index
The spanish term
and its POS label
English meaning of the term
Improved SentiWordNet 3.0
Negative and positive polarities of each meaning
Evaluate the polarity of the term by adding the
positive and negative polarities of its meanings
Negative and positive
polarities of the Spanish term
SpanishSentiWordNet
Topic detection assisting polarity detection
We got a large
room with 2 double beds and 2
bathrooms, The TV was Ok, a 27' CRT Flat Screen.
We stay at Hilton for 4 nights last march. It was a
pleasant stay. We got a large room with 2 double
beds and 2 bathrooms, The TV was Ok, a 27' CRT
Flat Screen. The concierge was very friendly when
we need. The room was very cleaned when we
arrived, we ordered some pizzas from room
service and the pizza was Ok also. The main Hall
is beautiful. The breakfast is charged, 20 dollars,
kinda expensive. The internet access (WiFi) is
charged, 13 dollars/day. Pros: Low rate price,
huge rooms, close to attractions at Loop, close to
metro station. Cons: Expensive breakfast, Internet
access charged. Tip: When leaving the building,
always use the Michigan Av exit. Its a great view.
The concierge was very
friendly when we need.
The breakfast is charged, 20
dollars, kinda expensive.
The room was very cleaned when we arrived
Schema for topic segmentation and detection
Textual corpora
Represent textual
units
Identify textual
units
vectors, graphs,
probabilistic distribution
textual units
Pre-process
tokens
Represent
segments
vectors, graphs,
probabilistic distribution
Cluster
segments
Segment
segments
segment clusters
(topics)
Label segment
clusters
Framework OpinionTopicDetection
Desktop application OpinionTD
Topics and corresponding labels
64
Open issues and future directions (1/2)
 Data collected from various resources are often so much noisy,
wrongly spelt and unstructured
 There is a lack of universal opinion grading system across
sentiment dictionaries
 Online discussion and political discussions often contain irony
and sarcastic sentences
 For better product comparison, we should compare a set of
products with respect to their common aspects
 The lack of proper review spam dataset is a major issue in
order to perform opinion spam detection
Open issues and future directions (2/2)
 A very few attempts were made to utilize the potential of
optimization techniques for feature selection
 There is a lack of opinion mining system in non-English
languages
 Cross-domain sentiment analysis is still a major challenge
 Aspect level sentiment analysis is very much required for
comparative visualization of similar kind of products
 The main challenge lies in review helpfulness is the validation
of the proposed method
Challenges
SemEval 2017
 Detecting sentiment, humor, and truth





Task 4: Sentiment Analysis in Twitter
Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs and News
Task 6: #HashtagWars: Learning a Sense of Humor
Task 7: Detection and Interpretation of English Puns
Task 8: RumourEval: Determining rumour veracity and support for
rumours
TASS 2017
 Task 1: Sentiment analysis at tweet level
 Task 2: Aspect-based sentiment analysis
Central University of Las Villas, Cuba
Artificial intelligence Lab
Computer Science Department
Thanks!
Questions, ideas, suggestions, comments, …
Opinion Mining
Prof. Leticia Arco García
[email protected]