slides

A Statistical Parsing Framework
for Sentiment Classification
ProbModels@ILCC
Li Dong
Li Dong
Sentence-Level Sentiment Classification
Input: sentence
Output: polarity label (e.g., positive / negative)
One of the most challenging problem is:
Sentiment composition
Li Dong
Two Mainstream Methods
Lexicon-based
Lexicons (funny, dislike) + Rules (not *, * but *)
Pros: simple, interpretable
Cons: scalability
Classifier-based
Classifier (SVM, MaxEnt) + Features (n-gram, POS)
Pros: data-driven, coverage
Cons: tricks to handle sentiment compositions
Li Dong
Revisit Sentiment Composition
Two key components
Lexicon
Rule
Q1: Can we learn them from data?
The movie is just so so, but i still like it.
Neg
Neg but Pos -> Pos
Li Dong
Pos
Revisit Sentiment Composition
Sentiment composition is not only about
polarity
P(Pos|Very good) > P(Pos|Good)
Q2: Can we model polarity strength?
The movie is just so so, but i still like it.
Neg
Neg but Pos -> Pos
Li Dong
Pos
Revisit Sentiment Composition
Latent sentiment structure
Q3: How do we define the sentiment structure?
The movie is just so so, but i still like it.
Neg
Neg but Pos -> Pos
Li Dong
Pos
Revisit Sentiment Composition
Latent sentiment structure
Q4: Can we only use (sentence, polarity) pairs
to learn latent sentiment structure?
The movie is just so so, but i still like it.
Neg
Neg but Pos -> Pos
Li Dong
Pos
Overview
Q1: learn lexicons and rules?
Q2: calculate polarity strength?
Q3: define sentiment structure?
Q4: learn structure from (sentiment, polarity)?
Li Dong
Comparison with Semantic Parsing
Sentiment parsing
deterministic
𝑝 π‘™π‘Žπ‘π‘’π‘™ 𝑠𝑒𝑛𝑑𝑒𝑛𝑐𝑒, π‘šπ‘œπ‘‘π‘’π‘™ = 𝑝 π‘™π‘Žπ‘π‘’π‘™ π‘‘π‘Ÿπ‘’π‘’, π‘π‘œπ‘™π‘Žπ‘Ÿπ‘–π‘‘π‘¦ π‘šπ‘œπ‘‘π‘’π‘™
p(π‘‘π‘Ÿπ‘’π‘’|𝑠𝑒𝑛𝑑𝑒𝑛𝑐𝑒, π‘π‘Žπ‘Ÿπ‘ π‘–π‘›π‘” π‘šπ‘œπ‘‘π‘’π‘™) probabilistic
Semantic parsing
𝑝 π‘Žπ‘›π‘ π‘€π‘’π‘Ÿ π‘žπ‘’π‘’π‘ π‘‘π‘–π‘œπ‘›, π‘šπ‘œπ‘‘π‘’π‘™ = 𝑝 π‘Žπ‘›π‘ π‘€π‘’π‘Ÿ π‘Ÿπ‘’π‘π‘Ÿπ‘’π‘ π‘’π‘›π‘‘π‘Žπ‘‘π‘–π‘œπ‘›, π‘‘π‘Žπ‘‘π‘Žπ‘π‘Žπ‘ π‘’
𝑝(π‘Ÿπ‘’π‘π‘Ÿπ‘’π‘ π‘’π‘›π‘‘π‘Žπ‘‘π‘–π‘œπ‘›|π‘žπ‘’π‘’π‘ π‘‘π‘–π‘œπ‘›, π‘π‘Žπ‘Ÿπ‘ π‘–π‘›π‘” π‘šπ‘œπ‘‘π‘’π‘™)
Sentiment parsing
sentiment lexicons
Semantic parsing
lexical triggers
(latent) sentiment tree
(sentence, polarity) pairs
calculate polarity strength
(latent) meaning representation
(question, answer) pairs
execute query
Li Dong
Sentiment
Grammar
Latent
Li Dong
Sentiment Grammar
Built upon Context-Free Grammar
Gs =< Vs,Ξ£s,S,Rs >
Vs={N,P,S,E}: non-terminal set
Ξ£s : terminal set
S: start symbol
Rs : rewrite rule set
Li Dong
Sentiment Grammar
Dictionary rules
P -> I still like it
P -> good
Li Dong
Sentiment Grammar
Combination rules
P->N but P
P->not P
P-> very P
Li Dong
Sentiment Grammar
Glue rules
P->NP
N->NN
Li Dong
Sentiment Grammar
OOV rules
Out-Of-Vocabulary text spans
Li Dong
Sentiment Grammar
Auxiliary rules
Combine out-of-vocabulary span and non-terminal
Li Dong
Sentiment Grammar
Start rules
Li Dong
Sentiment Grammar
Li Dong
Parsing Model
Present inference rules using deductive proof
systems
Rule
Item
If
Then
(Shieber, Schabes, and Pereira 1995; Goodman 1999)
Li Dong
Inference Rules
P->good
N->not P
P->N but P
P->NP
E->,
P->EP
P->PE
Li Dong
Sentiment
Grammar
Latent
Li Dong
Polarity Model
Calculate polarity strength from subspans
Polarity model:
Li Dong
Polarity Model
Two constraints for polarity strength
Non-negative
Normalized to 1
Li Dong
Polarity Model
Dictionary rules
P -> I still like it
P -> good
Estimated from data
Li Dong
Polarity Model
Glue rules
P->NP
P->PN
P->PP
N->NN
Li Dong
Polarity Model
Auxiliary rules
Combine OOV span and non-terminal
OOV span is ignored
Li Dong
Polarity Model
Combination rules
P->N but P
P->not P
P-> very P
Logistic regression
Li Dong
Why is logistic regression good?
Negation (N->not P)
Switch negation (Choi and Cardie 2008; Sauri 2008)
β€’ Simply reverse strength
β€’ P(Neg|not good) = P(Pos|good)
Shift negation (Taboada et al. 2011)
β€’ Problem: P(Neg|not very good) > P(Neg|not good)
β€’ Solution: P(Neg|not good) = P(Pos|good) – fixed_value
Li Dong
Why is logistic regression good?
Intensification (P->extremely P)
Fixed intensification (Polanyi and Zaenen 2006;
Kennedy and Inkpen 2006)
β€’ P(Pos|very good) = P(Pos|good) + fixed_value (>0)
Percentage intensification (Taboada et al. 2011)
β€’ P(Pos|very good) = P(Pos|good) * fixed_value (!=1)
Li Dong
Why is logistic regression good?
Reasons
Smooth polarity strength to (0,1)
Can learn various types of negation and
intensification
Can handle contrast (P->N but P)
Li Dong
Inference Rules (w/ Polarity Model)
P->good
N->not P
P->N but P
P->NP
E->,
P->EP
P->PE
Li Dong
Constraints (in Parsing Model)
Add side condition C for inference rules
Li Dong
Constraints (in Parsing Model)
Polarity should be consistent with non-terminal
Avoid improperly using combination rules for
neutral phrase
Do not use P-> a lot of P for P->a lot of people
Li Dong
Inference Rules (w/ Polarity Model and Constraints)
P->good
N->not P
P->N but P
P->NP
E->,
P->EP
P->PE
Li Dong
Ranking Model
Parsing model generates many candidates T(s)
Use log-linear model to rank and score T(s)
Weight
Feature
Log-partition
function
Li Dong
Bottom-Up Decoding
Ranking features decompose along trees
CYK algorithm can be used to conduct decoding
Dynamic programming
Predicted polarity
Li Dong
Ranking Model Training
Maximize probability of decoding correct labels
Regularizer
Log-likelihood of trees obtaining
the correct polarity label
Example: (a b c d, P)
S
N
S
P
N
P N
abcd
abcd
……
Li Dong
S
P
S
N
P N
P N
abcd
abcd
Learning Ranking Model
EM-like training (Liang, Jordan, and Klein 2013)
K-best trees
Tree1
sentence beam search and score candidates Tree2
model
Tree3
update parameters (AdaGrad)
Tree4
Iteration 1:
…….
Iteration N:
K-best trees
Tree3
sentence beam search and score candidates Tree5
model
Tree8
update parameters (AdaGrad)
Tree4
Li Dong
Learning Sentiment Grammar and Polarity Model
Dictionary rules (P->good, N->i dislike this)
Mine frequent fragments as candidates
Prune them using polarity strength
Problem
This is not a good movie. (negative)
Solution
Consider negation rules when learning polarity
model for dictionary rules
Li Dong
Learning Sentiment Grammar and Polarity Model
Combination rules (N->not P)
Generalize dictionary rules
Polarity model: logistic regression
Li Dong
Learning Sentiment Grammar and Polarity Model
Iteration process
negation rules
count
Dictionary rules
Polarity model
generalize
Combination rules
logistic
regression
Li Dong
Polarity model
Experiments
Datasets
critic
reviews
user
reviews
Li Dong
Experiments
90
85
80
75
70
65
60
55
RT-C RT-C
PL05-C
SVM
PL05-C
MNB
LM
SST
Voting-w/Rev
SST
RT-U RT-UIMDB-UIMDB-U MPQA
HardRule
Li Dong
TreeCRF
RAE
MVRNN
MPQA
s.parser
Ablation
Heuristic ranking trees
Without combination rules
90
85
80
75
70
RT-C
PL05-C
SST
LongMatch
Li Dong
RT-U
w/o Comb
s.parser
IMDB-U
MPQA
Combination Rules
Switch negation (Choi and Cardie 2008; Sauri 2008)
Simply reverse strength
P(Neg|i do not like) = P(Pos|like)
Shift negation (Taboada et al. 2011)
P(Neg|is not good) = P(Pos|good) – fixed_value
Li Dong
Conclusion
Sentiment
Grammar
Latent
Li Dong
Many Thanks
P->thanks
P->many P
Li Dong