An Unsupervised Aspect-Sentiment Model for

An Unsupervised Aspect-Sentiment Model
for Online Reviews
Samuel Brody
Department of Biomedical Informatics
Columbia University
[email protected]
May 2010
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
1 / 42
Outline
1
Introduction
2
Aspect
Prev. Approaches
Methodology
Experiments
3
Sentiment
Prev. Approaches
Methodology
Experiments
4
Conclusions
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
2 / 42
Outline
1
Introduction
2
Aspect
Prev. Approaches
Methodology
Experiments
3
Sentiment
Prev. Approaches
Methodology
Experiments
4
Conclusions
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
3 / 42
Unsupervised Learning and NLP
Unsupervised Methods
+ don’t require annotation
+ fit themselves to the task and data
+ suitable for plentiful data / low knowledge
scenarios
+ can harness knowledge sources
+ intuitive and cognitively plausible
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
4 / 42
An Unsupervised Aspect-Sentiment Model
for Online Reviews
Brody & Elhadad (NAACL ’10)
Detecting Salient Aspects in Online Reviews of
Health Providers
Brody & Elhadad (AMIA ’10 - under review)
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
5 / 42
Online Reviews
What we have:
overall score
details in free-form text
What we need:
the relevant information in the text, for
summary
comparison
pro/con lists
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
6 / 42
Online Reviews
What we have:
overall score
details in free-form text
What we need:
the relevant information in the text, for
summary
comparison
pro/con lists
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
6 / 42
Why Unsupervised?
manual annotation may not be feasible
relevant information is unpredictable
varying ways of expressing similar meaning
spelling errors and typos
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
7 / 42
“The Pitch”
Aspect-Sentiment Model
simple and elegant
unsupervised - no labeled training data
effective
flexible across domains
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
8 / 42
Outline
1
Introduction
2
Aspect
Prev. Approaches
Methodology
Experiments
3
Sentiment
Prev. Approaches
Methodology
Experiments
4
Conclusions
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
9 / 42
Previous Approaches
Keyword Based
manual annotation / ask users
IE techniques (TF-IDF)
use special lexicons
plus
adapt across domains
keyword expansion and clustering
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
10 / 42
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
11 / 42
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
11 / 42
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
12 / 42
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
12 / 42
Latent Dirichlet Allocation (LDA) - Blei et al. (2003)
The William Randolph Hearst Foundation will give $1.25 million to Lincoln Center ,
Metropolitan Opera Co. , New York Philharmonic and Juilliard School . “Our board
felt that we had a real opportunity to make a mark on the future of the performing arts
with these grants an act every bit as important as our traditional areas of support in
health, medical research , education and the social services ,” Hearst Foundation President Randolph A. Hearst said Monday in announcing the grants . Lincoln Center’s share
will be $200,000 for its new building , which will house young artists and provide new
public facilities . The Metropolitan Opera Co. and New York Philharmonic will receive
$400,000 each. The Juilliard School , where music and the performing arts are taught
, will get $250,000 . The Hearst Foundation , a leading supporter of the Lincoln Center
Consolidated Corporate Fund , will make its usual annual $100,000 donation, too.
“Arts”
Sam Brody
“Budgets”
“Children”
“Education”
NEW
MILLION
CHILDREN
SCHOOL
FILM
TAX
WOMEN
STUDENTS
SHOW
PROGRAM
PEOPLE
SCHOOLS
MUSIC
BUDGET
CHILD
EDUCATION
MOVIE
BILLION
YEARS
TEACHERS
PLAY
FEDERAL
FAMILIES
HIGH
MUSICAL
YEAR
WORK
PUBLIC
BEST
SPENDING
PARENTS
TEACHER
ACTOR
NEW
SAYS
BENNETT
FIRST An Unsupervised
STATE Aspect-Sentiment
FAMILY
Model for Online MANIGAT
Reviews
May 2010
13 / 42
Latent Dirichlet Allocation (LDA) - Blei et al. (2003)
The William Randolph Hearst Foundation will give $1.25 million to Lincoln Center ,
Metropolitan Opera Co. , New York Philharmonic and Juilliard School . “Our board
felt that we had a real opportunity to make a mark on the future of the performing arts
with these grants an act every bit as important as our traditional areas of support in
health, medical research , education and the social services ,” Hearst Foundation President Randolph A. Hearst said Monday in announcing the grants . Lincoln Center’s share
will be $200,000 for its new building , which will house young artists and provide new
public facilities . The Metropolitan Opera Co. and New York Philharmonic will receive
$400,000 each. The Juilliard School , where music and the performing arts are taught
, will get $250,000 . The Hearst Foundation , a leading supporter of the Lincoln Center
Consolidated Corporate Fund , will make its usual annual $100,000 donation, too.
“Arts”
Sam Brody
“Budgets”
“Children”
“Education”
NEW
MILLION
CHILDREN
SCHOOL
FILM
TAX
WOMEN
STUDENTS
SHOW
PROGRAM
PEOPLE
SCHOOLS
MUSIC
BUDGET
CHILD
EDUCATION
MOVIE
BILLION
YEARS
TEACHERS
PLAY
FEDERAL
FAMILIES
HIGH
MUSICAL
YEAR
WORK
PUBLIC
BEST
SPENDING
PARENTS
TEACHER
ACTOR
NEW
SAYS
BENNETT
FIRST An Unsupervised
STATE Aspect-Sentiment
FAMILY
Model for Online MANIGAT
Reviews
May 2010
13 / 42
LDA as a Generative Model
Generating a document:
sample a topic distribution θd from Dirichlet(α)
repeatedly generate words wi by
sampling a topic ti ∼ Multinomial(θd )
sampling a word wi |ti ∼ Multinomial(φti )
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
14 / 42
Intuition
LDA Assumptions:
Documents are composed of a distribution of topics.
The distributions are skewed towards few topics (Dirichlet prior).
⇒ Words in same document are more likely from the same topic.
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
15 / 42
Inference via Sampling - Griffiths and Steyvers (2004)
Inference Procedure
1
random initialization
2
calculate word-topic distribution
3
calculate topic distribution for documents, based on the words
4
re-assign topics to word instances, based on document distrib.
5
repeat to convergence
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
16 / 42
Using LDA for Aspects
Global vs. Local Topics - MG-LDA
(Titov and McDonald, 2008a)
"London, Pound, Tube" vs.
"Paris, Euro, Metro"
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
17 / 42
Using LDA for Aspects
Global vs. Local Topics - MG-LDA
(Titov and McDonald, 2008a)
"London, Pound, Tube" vs.
"Paris, Euro, Metro"
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
17 / 42
Using LDA for Aspects
Global vs. Local Topics - MG-LDA
(Titov and McDonald, 2008a)
"London, Pound, Tube" vs.
"Paris, Euro, Metro"
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
17 / 42
Using LDA for Aspects
Global vs. Local Topics - MG-LDA
(Titov and McDonald, 2008a)
"London, Pound, Tube" vs.
"Paris, Euro, Metro"
Mapping Topics to Aspects - MAS
(Titov and McDonald, 2008b)
100 topics → few rateable aspects
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
17 / 42
Using LDA for Aspects
Global vs. Local Topics - MG-LDA
(Titov and McDonald, 2008a)
"London, Pound, Tube" vs.
"Paris, Euro, Metro"
Mapping Topics to Aspects - MAS
(Titov and McDonald, 2008b)
100 topics → few rateable aspects
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
17 / 42
Topics ⇒ Rateable Aspects
Problem: global vs. local topics
Solution: use sentences instead of documents
Problem: mapping topics to aspects
Solution: use few topics corresponding directly to aspects
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
18 / 42
Topics ⇒ Rateable Aspects
Problem: global vs. local topics
Solution: use sentences instead of documents
Problem: mapping topics to aspects
Solution: use few topics corresponding directly to aspects
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
18 / 42
Topics ⇒ Rateable Aspects
Problem: global vs. local topics
Solution: use sentences instead of documents
Problem: mapping topics to aspects
Solution: use few topics corresponding directly to aspects
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
18 / 42
Model Order - Number of Aspects
P
F (C, Ĉ) =
1{Ci,j = Ĉi,j = 1, di , dj ∈ D̂}
P
i,j 1{Ci,j = 1, di , dj ∈ D̂}
i,j
Validation Procedure (for a given k ) :
1
For dataset D:
Run LDA with k topics on D to produce assignment matrix Ck .
Create random assignment matrix Rk (uniform assignment)
2
Sample random subset D i of size δ|D| from D.
Run LDA on D i to produce assignment matrix Cki .
Create random comparison matrix Rki (uniform assignments)
Calculate scorei (k ) = F (Cki , Ck ) − F (Rki , Rk )
3
4
Repeat 2nd step q times.
Return the average score over q iterations.
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
19 / 42
Outline
1
Introduction
2
Aspect
Prev. Approaches
Methodology
Experiments
3
Sentiment
Prev. Approaches
Methodology
Experiments
4
Conclusions
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
20 / 42
Data
Restaurants - 50,000 restaurant reviews from Citysearch NY
(http://newyork.citysearch.com/)
3,400 sentences annotated by Ganu et al. (2009)
Products - from Amazon (http://www.amazon.com/)
1,086 reviews for four leading netbooks
586 reviews for watches
Medical Professionals - 33,654 reviews of 12,898 practitioners in
NY state, from Rate MDs (http://www.ratemds.com)
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
21 / 42
Results in the Restaurant Domain
Inferred Aspect
Representative Words
Manual Aspect
Food - General
Wine & Drinks
Dishes
Bakery
menu, fresh, sushi, fish, chef, cuisine
wine, list, glass, drinks, beer, bottle
chicken, sauce, rice, cheese, spicy, salad,
hot, delicious, dessert, bagels, bread, chocolate
Food & Drink
Ambiance / Mood
Physical Atmosphere
great, atmosphere, wonderful, music, experience
bar, room, outside, seating, tables, cozy, loud
Atmosphere
Staff
Service
service, staff, friendly, attentive, busy, slow
table, order, wait, minutes, reservation, forgot
Staff
Value
portions, quality, worth, size, cheap
Price
Anec. - experience
Anec. - location
dinner, night, group, friends, date, family
out, back, definitely, around, walk, block
Anecdotes
Recommendation
Location
Misc. Description
best, top, favorite, city, NYC
restaurant, found, Paris, (New) York, location
place, eat, enjoy, big, often, stuff
Misc.
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
22 / 42
Results in the Restaurant Domain
Inferred Aspect
Representative Words
Manual Aspect
Food - General
Wine & Drinks
Dishes
Bakery
menu, fresh, sushi, fish, chef, cuisine
wine, list, glass, drinks, beer, bottle
chicken, sauce, rice, cheese, spicy, salad,
hot, delicious, dessert, bagels, bread, chocolate
Food & Drink
Ambiance / Mood
Physical Atmosphere
great, atmosphere, wonderful, music, experience
bar, room, outside, seating, tables, cozy, loud
Atmosphere
Staff
Service
service, staff, friendly, attentive, busy, slow
table, order, wait, minutes, reservation, forgot
Staff
Value
portions, quality, worth, size, cheap
Price
Anec. - experience
Anec. - location
dinner, night, group, friends, date, family
out, back, definitely, around, walk, block
Anecdotes
Recommendation
Location
Misc. Description
best, top, favorite, city, NYC
restaurant, found, Paris, (New) York, location
place, eat, enjoy, big, often, stuff
Misc.
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
22 / 42
Comparison with MAS - Titov and McDonald (2008b)
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
23 / 42
Products Domain
Netbooks
Aspect
Representative Words
Performance
Hardware
Memory
Software
Usability
Battery
Size
power, performance, mode, fan, quiet
drive, wireless, bluetooth, usb, speakers, webcam
ram, 2GB, upgrade, extra, 1GB, speed
using, office, software, installed, works, programs
internet, video, web, movies, music, email, play
battery, life, hours, time, cell, last
screen, keyboard, size, small, enough, big
..., Portability, Comparison, Mouse, General, Purchase, Looks, OS
Watches
Aspect
Representative Words
General
Appearance
Display
Performance
buy, perfect, husband, gift, beautiful, deal
looks, looking, look, nice, titanium, quality
read, little, date, display, digital, set
atomic, day, accurate, battery, solar, adjust
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
24 / 42
Medical Domain - Shared and Unique Aspects
Aspect
Recommendation
Manner
Anecdotal
Attention
Scheduling
Special
GP
X
X
X
X
–
Dent.
X
X
X
–
–
OBGYN
X
X
X
X
X
Psyc.
X
X
X
X
X
Perscrip.
& Tests
Cost
Pregnancy
–
Recommendation: years, best, family, recommend, practice
“My mother has been a patient here for approximately 2 years .. I have been very
impressed ...”
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
25 / 42
Medical Domain - Shared and Unique Aspects
Aspect
Recommendation
Manner
Anecdotal
Attention
Scheduling
Special
GP
X
X
X
X
–
Dent.
X
X
X
–
–
OBGYN
X
X
X
X
X
Psyc.
X
X
X
X
X
Perscrip.
& Tests
Cost
Pregnancy
–
Recommendation: years, best, family, recommend, practice
“My mother has been a patient here for approximately 2 years .. I have been very
impressed ...”
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
25 / 42
Medical Domain - Shared and Unique Aspects
Aspect
Recommendation
Manner
Anecdotal
Attention
Scheduling
Special
GP
X
X
X
X
–
Dent.
X
X
X
–
–
OBGYN
X
X
X
X
X
Psyc.
X
X
X
X
X
Perscrip.
& Tests
Cost
Pregnancy
–
Manner: staff, great, caring, knowledgeable, helpful
“Terrific doctor, excellent bed-side manner, staff needs to improve ...”
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
25 / 42
Medical Domain - Shared and Unique Aspects
Aspect
Recommendation
Manner
Anecdotal
Attention
Scheduling
Special
GP
X
X
X
X
–
Dent.
X
X
X
–
–
OBGYN
X
X
X
X
X
Psyc.
X
X
X
X
X
Perscrip.
& Tests
Cost
Pregnancy
–
Anecdotal: visit, problem, insurance, treatment, husband, symptoms
“ A week later I felt sick & when I called him, he then sent me to have a sonogram”
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
25 / 42
Medical Domain - Shared and Unique Aspects
Aspect
Recommendation
Manner
Anecdotal
Attention
Scheduling
Special
GP
X
X
X
X
–
Dent.
X
X
X
–
–
OBGYN
X
X
X
X
X
Psyc.
X
X
X
X
X
Perscrip.
& Tests
Cost
Pregnancy
–
Attention: time, questions, listens, appointment, talk
“prompt, fast, efficient .. he takes time for your questions ...”
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
25 / 42
Medical Domain - Shared and Unique Aspects
Aspect
Recommendation
Manner
Anecdotal
Attention
Scheduling
Special
GP
X
X
X
X
–
Dent.
X
X
X
–
–
OBGYN
X
X
X
X
X
Psyc.
X
X
X
X
X
Perscrip.
& Tests
Cost
Pregnancy
–
Schedule: time, office, out, appointment, minutes, phone, late
“You can call before you leave .... and they will let you know how long of a wait you
will have ...”
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
25 / 42
Medical Domain - Shared and Unique Aspects
Aspect
Recommendation
Manner
Anecdotal
Attention
Scheduling
Special
GP
X
X
X
X
–
Dent.
X
X
X
–
–
OBGYN
X
X
X
X
X
Psyc.
X
X
X
X
X
Perscrip.
& Tests
Cost
Pregnancy
–
Pregnancy : first, pregnancy, delivered, baby, hospital
“ My last pregnancy was high risk, and he stayed on top of everything ...”
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
25 / 42
Summary
automatically infer relevant aspects
aspects missed by annotators
Food & Drink, Staff & Service, Wireless, difference btw. doctors
flexible with regard to domain
leisure, products, professional services
handle non-keyword aspects
Food, Atmosphere, Bedside Manner, Scheduling, Portability
handle misspellings and domain specific words
desert, decour/decore, anti-pasta, creme-brule, sandwhich, omlette
six common misspellings of restaurant
Korma, Edamame, Dosa, Pho
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
26 / 42
Summary
automatically infer relevant aspects
aspects missed by annotators
Food & Drink, Staff & Service, Wireless, difference btw. doctors
flexible with regard to domain
leisure, products, professional services
handle non-keyword aspects
Food, Atmosphere, Bedside Manner, Scheduling, Portability
handle misspellings and domain specific words
desert, decour/decore, anti-pasta, creme-brule, sandwhich, omlette
six common misspellings of restaurant
Korma, Edamame, Dosa, Pho
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
26 / 42
Summary
automatically infer relevant aspects
aspects missed by annotators
Food & Drink, Staff & Service, Wireless, difference btw. doctors
flexible with regard to domain
leisure, products, professional services
handle non-keyword aspects
Food, Atmosphere, Bedside Manner, Scheduling, Portability
handle misspellings and domain specific words
desert, decour/decore, anti-pasta, creme-brule, sandwhich, omlette
six common misspellings of restaurant
Korma, Edamame, Dosa, Pho
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
26 / 42
Summary
automatically infer relevant aspects
aspects missed by annotators
Food & Drink, Staff & Service, Wireless, difference btw. doctors
flexible with regard to domain
leisure, products, professional services
handle non-keyword aspects
Food, Atmosphere, Bedside Manner, Scheduling, Portability
handle misspellings and domain specific words
desert, decour/decore, anti-pasta, creme-brule, sandwhich, omlette
six common misspellings of restaurant
Korma, Edamame, Dosa, Pho
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
26 / 42
Summary
automatically infer relevant aspects
aspects missed by annotators
Food & Drink, Staff & Service, Wireless, difference btw. doctors
flexible with regard to domain
leisure, products, professional services
handle non-keyword aspects
Food, Atmosphere, Bedside Manner, Scheduling, Portability
handle misspellings and domain specific words
desert, decour/decore, anti-pasta, creme-brule, sandwhich, omlette
six common misspellings of restaurant
Korma, Edamame, Dosa, Pho
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
26 / 42
Outline
1
Introduction
2
Aspect
Prev. Approaches
Methodology
Experiments
3
Sentiment
Prev. Approaches
Methodology
Experiments
4
Conclusions
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
27 / 42
Previous Approaches
Dictionary/Lexicon Based
Unsupervised (Graph, Constraints)
Hatzivassiloglou and McKeown (1997); Popescu
and Etzioni (2005)
Combined
Turney (2002); Fahrni and Klenner (2008); Jijkoun
and Hofmann (2009)
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
28 / 42
Methodology
Issues:
Dictionary/Lexicon Based
– lack of coverage
– misspellings and out-of-lexicon words
– require manual annotation
Unsupervised (Graph, Constraints)
– lack domain and aspect specificity
Combined
+ better coverage
– problems of both
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
29 / 42
Methodology
Design Principles:
start with a seed and propagate
allow for supervised or unsupervised seeds
keep it aspect-specific
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
30 / 42
Graph Representation
“The food was tasty and hot, but our waiter was not friendly.”
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
31 / 42
Graph Representation
“The food was tasty and hot, but our waiter was not friendly.”
(tasty, food), (hot, food), (not-friendly, waiter)
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
31 / 42
Graph Representation
“The food was tasty and hot, but our waiter was not friendly.”
(tasty, food), (hot, food), (not-friendly, waiter)
weight{(tasty,hot)}++;
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
31 / 42
Negation-Based Seed
Used:
explicit negation:
- friendly vs. not friendly
morphological indicators:
- (dis)courteous, (un)interesting, (in)expensive
Discarded:
disjunctions
- “convenient but noisy location”
- “cheap but tasty food”, “dainty but strong necklace”
antonyms from dictionary (WordNet)
- round vs. square, cool vs. warm
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
32 / 42
Negation-Based Seed
Used:
explicit negation:
- friendly vs. not friendly
morphological indicators:
- (dis)courteous, (un)interesting, (in)expensive
Discarded:
disjunctions
- “convenient but noisy location”
- “cheap but tasty food”, “dainty but strong necklace”
antonyms from dictionary (WordNet)
- round vs. square, cool vs. warm
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
32 / 42
Negation-Based Seed
Used:
explicit negation:
- friendly vs. not friendly
morphological indicators:
- (dis)courteous, (un)interesting, (in)expensive
Discarded:
disjunctions
- “convenient but noisy location”
- “cheap but tasty food”, “dainty but strong necklace”
antonyms from dictionary (WordNet)
- round vs. square, cool vs. warm
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
32 / 42
Negation-Based Seed
Used:
explicit negation:
- friendly vs. not friendly
morphological indicators:
- (dis)courteous, (un)interesting, (in)expensive
Discarded:
disjunctions
- “convenient but noisy location”
- “cheap but tasty food”, “dainty but strong necklace”
antonyms from dictionary (WordNet)
- round vs. square, cool vs. warm
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
32 / 42
Propagation Method
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
33 / 42
Propagation Method
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
33 / 42
Outline
1
Introduction
2
Aspect
Prev. Approaches
Methodology
Experiments
3
Sentiment
Prev. Approaches
Methodology
Experiments
4
Conclusions
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
34 / 42
Human Gold Standard
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
35 / 42
Evaluation Measures
Kendall’s tau coefficient (τk ) and Kendall’s distance (Dk )
G = {(a, b) : a, b ∈ Gold ∧ a ≺ b}
τk =
|Same|−|Reverse|
|G|
Dk =
|Reverse|+p·|Tied|
|G|
−1 ≤ τk ≤ +1
0 ≤ Dk ≤ 1
correlation: more is better
distance: less is better
Note: only pairs that are ordered in the gold standard are used!
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
36 / 42
Evaluation Results
Aspect
Mood
Staff
Main Dishes
Physical Atmo.
Bakery
Food - General
Wine & Drinks
Service
Average
Sam Brody
Auto.
τk ↑ Dk ↓
0.53 0.23
0.57 0.22
0.19 0.40
0.34 0.33
0.33 0.33
0.19 0.41
0.32 0.34
0.41 0.30
0.36 0.32
Lexicon
τk ↑ Dk ↓
0.56 0.22
0.60 0.20
0.38 0.31
0.25 0.37
0.35 0.33
0.41 0.30
0.52 0.24
0.54 0.23
0.45 0.27
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
37 / 42
Summary
no manual annotation
sentiment indicators missed by annotators
“neutral” adjectives convey sentiment in a certain domain
sentiment depends on aspect
warm, cheap, busy
handle misspellings and domain specific words
exelent, tastey
New-Yorky, orgasmic
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
38 / 42
Summary
no manual annotation
sentiment indicators missed by annotators
“neutral” adjectives convey sentiment in a certain domain
sentiment depends on aspect
warm, cheap, busy
handle misspellings and domain specific words
exelent, tastey
New-Yorky, orgasmic
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
38 / 42
Summary
no manual annotation
sentiment indicators missed by annotators
“neutral” adjectives convey sentiment in a certain domain
sentiment depends on aspect
warm, cheap, busy
handle misspellings and domain specific words
exelent, tastey
New-Yorky, orgasmic
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
38 / 42
Summary
no manual annotation
sentiment indicators missed by annotators
“neutral” adjectives convey sentiment in a certain domain
sentiment depends on aspect
warm, cheap, busy
handle misspellings and domain specific words
exelent, tastey
New-Yorky, orgasmic
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
38 / 42
Outline
1
Introduction
2
Aspect
Prev. Approaches
Methodology
Experiments
3
Sentiment
Prev. Approaches
Methodology
Experiments
4
Conclusions
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
39 / 42
Summary
Aspect-Sentiment Model
infers relevant information
flexible across domains
no annotation / supervision
robust to noise and error
Future Directions
aspect:
closer integration of aspect and sentiment
cross-sentence interaction
sentiment:
other sentiment indicators
other graph methods
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
40 / 42
Ask me about ...
Dependency Parsing
?
=
Machine Translation
?
=
Combining semantic and distributional similarity for:
– automatic labeling
– text simplification
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
41 / 42
Thank You!
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
42 / 42
Bibliography
Blei, David M., Andrew Y. Ng, and Michael I. Jordan. 2003. Latent dirichlet allocation. Journal of Machine Learning Research
3:993–1022.
Fahrni, Angela and Manfred Klenner. 2008. Old Wine or Warm Beer: Target-Specific Sentiment Analysis of Adjectives. In Proc.of
the Symposium on Affective Language in Human and Machine, AISB 2008 Convention. pages 60 – 63.
Ganu, Gayatree, Noemie Elhadad, and Amelie Marian. 2009. Beyond the stars: Improving rating predictions using review text
content. In WebDB.
Griffiths, Thomas L. and Mark Steyvers. 2004. Finding scientific topics. Proceedings of the National Academy of Sciences of the
United States of America 101(Suppl 1):5228–5235.
Hatzivassiloglou, Vasileios and Kathleen R. McKeown. 1997. Predicting the semantic orientation of adjectives. In Proc. of the
35th Annual Meeting of the Association for Computational Linguistics. ACL, Madrid, Spain, pages 174–181.
Jijkoun, Valentin and Katja Hofmann. 2009. Generating a non-english subjectivity lexicon: Relations that matter. In Proc. of the
12th Conference of the European Chapter of the ACL (EACL 2009). ACL, Athens, Greece, pages 398–405.
Popescu, Ana-Maria and Oren Etzioni. 2005. Extracting product features and opinions from reviews. In HLT ’05: Proc. of the
conference on Human Language Technology and Empirical Methods in Natural Language Processing. ACL, Morristown, NJ,
USA, pages 339–346.
Titov, Ivan and Ryan McDonald. 2008a. Modeling online reviews with multi-grain topic models. In WWW ’08: Proc. of the 17th
international conference on World Wide Web. ACM, New York, NY, pages 111–120.
Titov, Ivan and Ryan McDonald. 2008b. A joint model of text and aspect ratings for sentiment summarization. In Proc. of ACL-08:
HLT . ACL, Columbus, Ohio, pages 308–316.
Turney, Peter. 2002. Thumbs up or thumbs down? semantic orientation applied to unsupervised classification of reviews. In Proc.
of 40th Annual Meeting of the Association for Computational Linguistics. ACL, Philadelphia, Pennsylvania, USA, pages
417–424.
Sam Brody
An Unsupervised Aspect-Sentiment Model for Online Reviews
May 2010
42 / 42